Skip to content

[RFC] Agent Categorization + ReACT Agent OutputParser#955

Closed
yanxi0830 wants to merge 5 commits intomainfrom
react_agent
Closed

[RFC] Agent Categorization + ReACT Agent OutputParser#955
yanxi0830 wants to merge 5 commits intomainfrom
react_agent

Conversation

@yanxi0830
Copy link
Copy Markdown
Contributor

@yanxi0830 yanxi0830 commented Feb 4, 2025

Problem

We want to standardize the steps necessary for users to build a ReACT agent with the ability to interleave between generating thoughts and taking task specific actions dynamically.

The current agent orchestration loop requires ad hoc logic for intercepting agent outputs and parsing outputs from output messages to fit a ReACT framework (example). This proposes changes to LlamaStack client SDKs and server APIs for better ergonomics to build an ReACT agent.

Proposed Solution

We want to have the flexibility to configure custom prompts and custom output parsers in agent loop execution.

  1. Introduce the notion of OutputParser for parsing outputs from ReACT prompting into ToolCall agent output.
    Our current agent loop with custom tool calls will loop and call tools until there’s no more tool response. In ReACT framework, action output typically maps to a tool call. We can re-utilize the agent loop, but add a parsing logic right after agent outputs to populate “action” into ToolCall to enable ReACT.
  1. For further generalization: RFC for high-level concept categorization of what defines an Agent Type.

Current Agent Types Summary

  • An Agent instance is defined by an AgentConfig
  • An Agent instance can be categorized into several classes
    • Vanilla
      • keep track of conversation loop history
    • RAG
      • access to "builtin::rag" tool
      • we current first force retrieve context by explicitly calling memory tool
    • ToolCalling
      • could be configured to use "builtin::websearch" / "builtin::code_interpreter" / "builtin::wolfram_alpha" / "builtin::filesystem" / etc
    • ReACT
      • require custom output parser to execute "action" as tool calls
Agent Type Agent Config Template System Prompt Output Parser Orchestration Note
Vanilla raw Agent instruction Conversation Loop
Tool Calling toolgroups=[“builtin::websearch”, “builtin::code_interpreter”] default tool prompt + instruction decode_assistant_message_from_content Loop until there’s no more tool calls

Pass tool response as next turn (built-in tool & custom tool differ)

RAG toolgroups=(builtin::rag, args: {vector_db_ids})

force_retrieval=?

default tool prompt + instruction Retrieve context from RAG tool before calling generation. We should add an ability to force retrieval & ability for auto retrieval via model tool calling
ReACT instructions=react_prompt

output_parser=react_output_parser

ReACT prompting (thought-action-answer) Parse from action / action_input into ToolCall as part of Agent Response. Loop until there’s no more tool calls

Pass tool response as next turn

Proof of Concept Implementation

response_format: Optional[ResponseFormat] = None
stream: Optional[bool] = False
logprobs: Optional[LogProbConfig] = None
response_output_parser: Optional[ResponseOutputParser] = Field(default=ResponseOutputParser.default)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced offline w/ @ashwinb @hardikjshah , we will hold off this and put parsers on client SDK side.

@hardikjshah
Copy link
Copy Markdown
Contributor

Summarizing what we discussed offline --

  1. No need for a ResponseOutputParser as this can be done client side ( for the time being )

  2. Lets go away from the custom format to structured outputs

Thought: I need to transform the image that I received in the previous observation to make it green.
Action:
{
  "action": "image_transformer",
  "action_input": {"image": "image_1.jpg"}
}<end_action>
class Response: 
    thought: str
    tool_name: Optional[str]
    tool_params: Optional[str]
    answer: Optional[str]

[ not sure if this will work, i think you ll have to test on this format and see what works , for eg. does optional work consistently ]

  1. Encapsulate all ReACTAgent logic in one class so that end users using it can do so with 1-2 lines of code.

@yanxi0830
Copy link
Copy Markdown
Contributor Author

Closing PR, moving to https://github.com/meta-llama/llama-stack/discussions/975

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants