Skip to content

GPU free-tier arbitrage routing policy with observable fallback #818

@Spherrrical

Description

@Spherrrical

Plano already sits in the routing layer but it doesn't take advantage of that position to automatically reduce cost for developers. We should build a first-class GPU free-tier arbitrage policy: routing low-stakes or bursty agent traffic to free/low-cost providers when available, with deterministic fallback to the primary when they're unavailable or overloaded, and full trace visibility into every routing decision.

Requirements

  • Configurable arbitrage policy at the model_providers level: specify a ranked list of free/low-cost providers with fallback ordering
  • Deterministic fallback: when a free-tier provider is unavailable, rate-limited, or errors, Plano falls back to the primary predictably
  • All routing decisions surfaced in traces: which provider was selected, why, when it fell back, and what the next selection was
  • Reliability guardrails: free-tier providers should not silently degrade: failures must be explicit and logged

What "done" looks like

A developer can add a minimal config block to enable arbitrage, run a request, and see in the trace: provider selected, reason (free-tier available), fallback chain if applicable.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions