Skip to content

Support trajectory pinning for consistent model selection in agentic loops #813

@adilhafeez

Description

@adilhafeez

Summary

In agentic loops, the same conversation hits Plano's routing endpoint multiple times. Today each call re-evaluates routing independently, so the selected model can change mid-conversation. Session pinning ensures that once a model is selected for a session, subsequent routing calls in that session return the same model.

How it works

The caller sends an X-Session-Id header in the routing request. Plano caches the routing decision keyed by that ID:

  • First call with X-Session-Id: abc → run routing, select model, cache abc → gpt-4o
  • Subsequent calls with X-Session-Id: abc → skip routing, return cached gpt-4o
  • Cache entries expire via configurable TTL (default 30 min)
  • No X-Session-Id header → routing runs fresh every time (current behavior, no breaking change)

Request

POST /routing/v1/chat/completions
X-Session-Id: session-abc-123
Content-Type: application/json

{
  "messages": [...]
}

Response

{
  "model": "gpt-4o",
  "route": "quick",
  "session_id": "session-abc-123",
  "pinned": true,
  "trace_id": "abc123..."
}

pinned: true indicates the result came from cache. pinned: false on first routing decision.

Implementation

  • Extract X-Session-Id from request headers in the routing handler
  • Add an in-memory TTL cache in RouterService keyed by session ID (e.g. HashMap<String, (String, Instant)> behind a mutex)
  • Before calling determine_route(), check cache for a valid (non-expired) entry
  • On cache miss, run routing and store the result
  • TTL configurable via routing.session_ttl_seconds in plano config (default 1800)
  • No database or external state needed

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions