Skip to content

Discussion #975: [RFC] Agent Categorization + ReACT Agent OutputParser #4582

@raghotham

Description

@raghotham

Originally posted by @yanxi0830 on 2025-02-05 22:16 UTC
Category: Ideas
Status: Closed on 2025-02-13 18:29 UTC


-- moving over discussions from #955

Problem

We want to standardize the steps necessary for users to build a ReACT agent with the ability to interleave between generating thoughts and taking task specific actions dynamically.

The current agent orchestration loop requires ad hoc logic for intercepting agent outputs and parsing outputs from output messages to fit a ReACT framework (example). This proposes changes to LlamaStack client SDKs and server APIs for better ergonomics to build an ReACT agent.

Proposed Solution

We want to have the flexibility to configure custom prompts and custom output parsers in agent loop execution.

  1. Introduce the notion of OutputParser for parsing outputs from ReACT prompting into ToolCall agent output.
    Our current agent loop with custom tool calls will loop and call tools until there’s no more tool response. In ReACT framework, action output typically maps to a tool call. We can re-utilize the agent loop, but add a parsing logic right after agent outputs to populate “action” into ToolCall to enable ReACT.
  1. For further generalization: RFC for high-level concept categorization of what defines an Agent Type.

Current Agent Types Summary

  • An Agent instance is defined by an AgentConfig
  • An Agent instance can be categorized into several classes
    • Vanilla
      • keep track of conversation loop history
    • RAG
      • access to "builtin::rag" tool
      • we current first force retrieve context by explicitly calling memory tool
    • ToolCalling
      • could be configured to use "builtin::websearch" / "builtin::code_interpreter" / "builtin::wolfram_alpha" / "builtin::filesystem" / etc
    • ReACT
      • require custom output parser to execute "action" as tool calls
Agent Type Agent Config Template System Prompt Output Parser Orchestration Note
Vanilla raw Agent instruction Conversation Loop
Tool Calling toolgroups=[“builtin::websearch”, “builtin::code_interpreter”] default tool prompt + instruction decode_assistant_message_from_content Loop until there’s no more tool calls

Pass tool response as next turn (built-in tool & custom tool differ)

RAG toolgroups=(builtin::rag, args: {vector_db_ids})

force_retrieval=?

default tool prompt + instruction Retrieve context from RAG tool before calling generation. We should add an ability to force retrieval & ability for auto retrieval via model tool calling
ReACT instructions=react_prompt

output_parser=react_output_parser

ReACT prompting (thought-action-answer) Parse from action / action_input into ToolCall as part of Agent Response. Loop until there’s no more tool calls

Pass tool response as next turn

Proof of Concept Implementation


Migrated from discussion #975: https://github.com/llamastack/llama-stack/discussions/975

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions