-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Originally posted by @yanxi0830 on 2025-02-05 22:16 UTC
Category: Ideas
Status: Closed on 2025-02-13 18:29 UTC
-- moving over discussions from #955
Problem
We want to standardize the steps necessary for users to build a ReACT agent with the ability to interleave between generating thoughts and taking task specific actions dynamically.
The current agent orchestration loop requires ad hoc logic for intercepting agent outputs and parsing outputs from output messages to fit a ReACT framework (example). This proposes changes to LlamaStack client SDKs and server APIs for better ergonomics to build an ReACT agent.
Proposed Solution
We want to have the flexibility to configure custom prompts and custom output parsers in agent loop execution.
- Introduce the notion of OutputParser for parsing outputs from ReACT prompting into ToolCall agent output.
Our current agent loop with custom tool calls will loop and call tools until there’s no more tool response. In ReACT framework, action output typically maps to a tool call. We can re-utilize the agent loop, but add a parsing logic right after agent outputs to populate “action” into ToolCall to enable ReACT.
- Client Agent SDK [RFC] Client Agent SDK OutputParser llama-stack-client-python#121
- We need to incorporate similar output parsing on server for ReACT with builtin tools.
- For further generalization: RFC for high-level concept categorization of what defines an Agent Type.
Current Agent Types Summary
- An
Agentinstance is defined by anAgentConfig - An
Agentinstance can be categorized into several classes- Vanilla
- keep track of conversation loop history
- RAG
- access to "builtin::rag" tool
- we current first force retrieve context by explicitly calling memory tool
- ToolCalling
- could be configured to use "builtin::websearch" / "builtin::code_interpreter" / "builtin::wolfram_alpha" / "builtin::filesystem" / etc
- ReACT
- require custom output parser to execute "action" as tool calls
- Vanilla
| Agent Type | Agent Config Template | System Prompt | Output Parser | Orchestration | Note |
| Vanilla | raw Agent | instruction | Conversation Loop | ||
| Tool Calling | toolgroups=[“builtin::websearch”, “builtin::code_interpreter”] | default tool prompt + instruction | decode_assistant_message_from_content | Loop until there’s no more tool calls
Pass tool response as next turn (built-in tool & custom tool differ) |
|
| RAG | toolgroups=(builtin::rag, args: {vector_db_ids})
force_retrieval=? |
default tool prompt + instruction | Retrieve context from RAG tool before calling generation. | We should add an ability to force retrieval & ability for auto retrieval via model tool calling | |
| ReACT | instructions=react_prompt
output_parser=react_output_parser |
ReACT prompting (thought-action-answer) | Parse from action / action_input into ToolCall as part of Agent Response. | Loop until there’s no more tool calls
Pass tool response as next turn |
Proof of Concept Implementation
- llama-stack-client-python: [RFC] Client Agent SDK OutputParser llama-stack-client-python#121
- llama-stack-apps: feat: ReACT agent example llama-stack-apps#166
- llama-stack: [RFC] Agent Categorization + ReACT Agent OutputParser #955
- llama-models: [RFC] response output type meta-llama/llama-models#272
Migrated from discussion #975: https://github.com/llamastack/llama-stack/discussions/975