Skip to content

Support v1/responses API with state management in the agent orchestrator path #791

@llrightll

Description

@llrightll

Motivation

Multi-turn chatbots that use Plano's agent orchestration for intent-based routing currently cannot leverage the Responses API's previous_response_id for stateful conversations. Users must manage conversation history client-side.

Problem

The state management infrastructure (StateStorage trait, ResponsesStateProcessor, memory/PostgreSQL backends) is only wired into the direct proxy path (crates/brightstaff/src/handlers/llm.rs), not the agent orchestrator path (crates/brightstaff/src/handlers/agent_chat_completions.rs).

Technical gaps

  1. crates/brightstaff/src/main.rsstate_storage is passed to llm_chat but not to agent_chat. The agent_chat function signature has no StateStorage parameter.

  2. crates/brightstaff/src/handlers/agent_chat_completions.rshandle_agent_chat_inner() calls client_request.get_messages() early, converting to Vec<OpenAIMessage>. For Responses API requests with InputItem types (tool results, images), this conversion may lose information. No previous_response_id handling exists.

  3. crates/brightstaff/src/handlers/pipeline_processor.rsinvoke_agent hardcodes the endpoint URL path to /v1/chat/completions when calling downstream agents. The request body is serialized generically via ProviderRequestType::to_bytes(), but the URL forces agents to receive calls at the chat completions endpoint regardless of the original request format.

  4. crates/brightstaff/src/handlers/response_handler.rscreate_streaming_response in the agent path is a raw byte passthrough with no stream processing. Compare to llm.rs where responses go through ObservableStreamProcessor and optionally ResponsesStateProcessor.

  5. crates/brightstaff/src/state/response_state_processor.rs — Only instantiated in llm.rs. Never used in the agent orchestration flow.

Proposed solution (following existing patterns)

  1. Pass state_storage to agent_chat from main.rs (same pattern as llm_chat).
  2. For Responses API requests with previous_response_id: resolve stored state via StateStorage, convert InputItemOpenAIMessage for determine_orchestration() / agent selection.
  3. For the final agent's response: apply the same stream translation pipeline used in llm.rs — translate chat completions SSE into Responses API format (via hermesllm's translation layer), then wrap with ResponsesStateProcessor to capture response_id and output from the translated response.completed event.

For multi-agent chains within a single turn, the state processor should wrap only the final combined response (the orchestrator already distinguishes is_last_agent), so intermediate agent responses are not stored individually.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions