feat(model-routing): per-tier model access gates and billing multipliers#31
feat(model-routing): per-tier model access gates and billing multipliers#31RustMunkey merged 4 commits intomainfrom
Conversation
ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Free Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a new Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant API as API Service
participant Model as "@maschina/model"
participant Jobs as Jobs Service
participant Daemon as Daemon Orchestrator
participant Runtime as Python Runtime
Client->>API: POST /agents/:id/run (body: input, optional model)
API->>Model: validateModelAccess(tier, model?)
Model-->>API: {allowed, reason?}
alt Access Denied
API-->>Client: 403 Forbidden
else Access Allowed
API->>Model: resolveModel(tier, requested)
Model-->>API: resolved_model_id
API->>API: compute systemPrompt, timeoutSecs
API->>Jobs: dispatchAgentRun(..., model: resolved_model_id, systemPrompt, timeoutSecs)
Jobs->>Daemon: enqueue AgentExecuteJob(..., model, system_prompt, timeout_secs)
Daemon->>Daemon: convert to JobToRun(..., model, system_prompt)
Daemon->>Runtime: POST /run {plan_tier, model, system_prompt, max_tokens, timeout_secs, input_payload}
alt Model is Ollama
Runtime->>Runtime: OllamaRunner.execute(local)
Runtime-->>Daemon: RunResponse(output_payload, billed_tokens)
else Model is Anthropic
Runtime->>Runtime: AnthropicRunner.execute(cloud) + apply multiplier
Runtime-->>Daemon: RunResponse(output_payload, billed_tokens)
end
Daemon-->>Jobs: mark complete / persist output_payload
Jobs-->>API: run result
API-->>Client: execution result (model_used, output)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Note 🎁 Summarized by CodeRabbit FreeYour organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login. Comment |
What
Per-tier model access gates and billing multipliers across the full stack (API → NATS → daemon → Python runtime).
Why
Different models have wildly different costs. Without routing, any user on any tier could request Claude Opus and burn quota at 15x the rate. This enforces access and bills correctly.
How
Testing
Checklist — check everything except integration tests (none added).
Summary by CodeRabbit