Lazy Skill Loading Refactor

Overview

Refactored the agent skill system to implement lazy loading following the moltbot pattern. This significantly reduces token usage and improves agent performance by only loading full skill documentation when needed.

Problem

Previously, the entire SKILL.md content for all configured skills was included in the agent's system prompt. This:

Wasted tokens on skills the agent might not use
Reduced context window available for actual conversation
Decreased agent performance due to prompt bloat

Solution

Implemented lazy loading pattern inspired by moltbot:

System prompt: Only includes skill names + short descriptions
Agent decision: Agent chooses which skill to use based on descriptions
On-demand loading: Agent calls read_skill tool to get full SKILL.md
Execution: Agent follows SKILL.md instructions (may be multi-turn)
Summary: Agent returns final result to user

Changes Made

1. Modified `AgentSkillReference.format_for_prompt()`

File: backend/agent_framework/skill_manager.py

Changed from returning full SKILL.md content to only name + description:

def format_for_prompt(self) -> str:
    """Format skill for inclusion in agent prompt.
    
    Only includes name and description - NOT the full SKILL.md content.
    Agent must use read_skill tool to get full documentation.
    """
    return f"- {self.name}: {self.description}"

2. Updated `SkillManager.format_skills_for_prompt()`

File: backend/agent_framework/skill_manager.py

Updated to follow moltbot pattern with clear instructions:

def format_skills_for_prompt(self) -> str:
    """Format Agent Skills for inclusion in agent prompt.
    
    Following moltbot pattern: Only include skill names and descriptions.
    Agent must use read_skill tool to get full SKILL.md content.
    """
    # ... code ...
    
    prompt = f"""

## Skills (mandatory)

Before replying: scan available skills below.
- If exactly one skill clearly applies: read its SKILL.md with `read_skill`, then follow it.
- If multiple could apply: choose the most specific one, then read/follow it.
- If none clearly apply: do not read any SKILL.md.

Constraints: never read more than one skill up front; only read after selecting.

Available skills:
{skills_list}

"""

3. Added `read_skill` Tool

File: backend/agent_framework/tools/read_skill_tool.py

Created new tool that allows agents to read full SKILL.md content on demand:

class ReadSkillTool(BaseTool):
    """Tool for reading complete skill documentation.
    
    This tool allows the agent to read the full SKILL.md content and
    example code for a specific skill when needed.
    """
    
    name: str = "read_skill"
    description: str = """Read the complete documentation (SKILL.md) for a specific skill.

Use this tool when:
- You've decided to use a specific skill
- You need to understand how to use the skill
- You need to see example code or configuration

Input: skill_name (e.g., "my_cal", "weather-forcast")
Output: Complete SKILL.md content with usage instructions and examples"""

4. Integrated `read_skill` Tool in Agent Initialization

File: backend/agent_framework/base_agent.py

Added read_skill tool to agent's toolset:

# Add read_skill tool (for lazy loading of Agent Skills)
from agent_framework.tools.read_skill_tool import create_read_skill_tool
read_skill_tool = create_read_skill_tool(
    agent_id=self.config.agent_id,
    user_id=self.config.owner_user_id
)
self.tools.append(read_skill_tool)

5. Simplified System Prompt

File: backend/agent_framework/base_agent.py

Removed over-emphasis on code_execution tool:

def _create_system_prompt(self) -> str:
    # ... code ...
    
    # Simplified tool listing - no special treatment for code_execution
    if self.tools and not self.llm_supports_tools:
        tools_prompt = "\n\n## Available Tools\n\n"
        tools_prompt += "You have access to the following tools. Use them when appropriate:\n\n"
        
        for tool in self.tools:
            tools_prompt += f"- {tool.name}: {tool.description}\n"

6. Added Multi-Turn Execution with Recursion Limits

File: backend/agent_framework/base_agent.py

Added support for multi-turn tool execution with configurable limits:

# Compile the agent graph with recursion limit
self.agent = builder.compile(
    checkpointer=None,
    interrupt_before=None,
    interrupt_after=None,
    debug=False
)

# Set recursion limit based on max_iterations
# Each iteration can have: LLM call -> Tool call -> LLM call
# So we need at least 2x max_iterations for the recursion limit
self.recursion_limit = self.config.max_iterations * 2

# Use recursion limit when invoking
result = self.agent.invoke(
    {"messages": [HumanMessage(content=user_message)]},
    config={"recursion_limit": self.recursion_limit}
)

7. Updated System Prompt for Multi-Turn Guidance

File: backend/agent_framework/base_agent.py

Added clear guidelines for when agent should complete tasks:

## Task Execution Guidelines

When solving problems:
1. Analyze the user's request carefully
2. Use available tools when appropriate
3. Execute multiple steps if needed (you can call tools multiple times)
4. Provide clear and helpful responses
5. If you need more information from the user, ask clarifying questions

## When to Complete

You should complete the task when:
- You have successfully accomplished what the user asked for
- You have gathered all necessary information and provided a complete answer
- You need additional input from the user (ask a question and wait)

You can use tools multiple times in sequence to accomplish complex tasks. 
The system allows up to {max_iterations} iterations.

Architecture

Two Types of Skills

LangChain Tool: Lightweight, single-turn, fast execution
- Bound directly to LLM
- Example: my_cal (calculator)
- No lazy loading needed (already lightweight)
Agent Skill: Heavyweight, multi-turn, complex workflows
- Requires reading SKILL.md documentation
- Example: weather-forcast (multi-step workflow)
- Uses lazy loading pattern

Workflow

User Request
    ↓
Agent analyzes request
    ↓
Agent scans skill descriptions (in prompt)
    ↓
Agent selects appropriate skill (if needed)
    ↓
Agent calls read_skill("skill-name") (if needed)
    ↓
read_skill returns full SKILL.md
    ↓
Agent follows SKILL.md instructions
    ↓
Agent may execute code, call APIs, etc. (multi-turn)
    ↓
Agent may call more tools if needed (up to max_iterations)
    ↓
Agent decides task is complete
    ↓
Agent returns final summary to user

Multi-Turn Execution

The agent can execute multiple tool calls in sequence:

LangGraph handles the loop: The agent graph automatically loops between LLM and tool nodes
Recursion limit: Set to max_iterations * 2 to allow multiple LLM→Tool→LLM cycles
Agent decides when done: Agent stops when it has no more tool calls to make
Prevents infinite loops: Recursion limit ensures the agent doesn't loop forever
Configurable: max_iterations can be set per agent (default: 10)

Example multi-turn flow:

User: "What's the weather in Beijing?"
    ↓
Agent: Calls read_skill("weather-forcast")
    ↓
Agent: Reads SKILL.md instructions
    ↓
Agent: Calls code_execution to fetch weather data
    ↓
Agent: Analyzes the data
    ↓
Agent: Formats and returns weather summary

Benefits

Reduced Token Usage: Only load skills that are actually used
Better Performance: Smaller prompts = faster inference
More Context: More room for conversation history
Clearer Intent: Agent explicitly chooses which skill to use
Better Tool Selection: Agent prefers specialized tools (my_cal) over generic ones (code_execution)
Multi-Turn Execution: Agent can execute multiple tool calls in sequence to accomplish complex tasks
Controlled Iterations: Recursion limit prevents infinite loops while allowing flexible execution
Agent-Driven Completion: Agent decides when task is complete or when to ask for user input

Testing

To test the lazy loading:

Test LangChain Tool (should be called directly):

User: "What is 2423 * 1312?"
Expected: Agent calls my_cal tool directly

Test Agent Skill (should read SKILL.md first):

User: "What's the weather in Beijing?"
Expected: 
- Agent calls read_skill("weather-forcast")
- Agent reads SKILL.md
- Agent follows instructions
- Agent returns weather summary

Test Thinking/Content Separation:

User: "Calculate 21312 * 213"
Expected:
- Frontend shows thinking process separately
- Frontend shows final answer in content section
- No mixing of thinking and content

References

Moltbot reference: examples-of-reference/moltbot/src/agents/system-prompt.ts
Original design: docs/backend/agent-skill-integration-design.md
CustomOpenAIChat: backend/llm_providers/custom_openai_provider.py

Future Improvements

Caching: Cache read SKILL.md content within a conversation
Skill Discovery: Allow agent to search for skills by capability
Skill Composition: Allow agent to combine multiple skills
Skill Versioning: Support multiple versions of the same skill

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy Skill Loading Refactor

Overview

Problem

Solution

Changes Made

1. Modified `AgentSkillReference.format_for_prompt()`

2. Updated `SkillManager.format_skills_for_prompt()`

3. Added `read_skill` Tool

4. Integrated `read_skill` Tool in Agent Initialization

5. Simplified System Prompt

6. Added Multi-Turn Execution with Recursion Limits

7. Updated System Prompt for Multi-Turn Guidance

Architecture

Two Types of Skills

Workflow

Multi-Turn Execution

Benefits

Testing

References

Future Improvements

FilesExpand file tree

lazy-skill-loading-refactor.md

Latest commit

History

lazy-skill-loading-refactor.md

File metadata and controls

Lazy Skill Loading Refactor

Overview

Problem

Solution

Changes Made

1. Modified AgentSkillReference.format_for_prompt()

2. Updated SkillManager.format_skills_for_prompt()

3. Added read_skill Tool

4. Integrated read_skill Tool in Agent Initialization

5. Simplified System Prompt

6. Added Multi-Turn Execution with Recursion Limits

7. Updated System Prompt for Multi-Turn Guidance

Architecture

Two Types of Skills

Workflow

Multi-Turn Execution

Benefits

Testing

References

Future Improvements

1. Modified `AgentSkillReference.format_for_prompt()`

2. Updated `SkillManager.format_skills_for_prompt()`

3. Added `read_skill` Tool

4. Integrated `read_skill` Tool in Agent Initialization