Skip to content

feat(model-routing): per-tier model access gates and billing multipliers#31

Merged
RustMunkey merged 4 commits intomainfrom
feat/model-routing
Mar 8, 2026
Merged

feat(model-routing): per-tier model access gates and billing multipliers#31
RustMunkey merged 4 commits intomainfrom
feat/model-routing

Conversation

@RustMunkey
Copy link
Copy Markdown
Owner

@RustMunkey RustMunkey commented Mar 8, 2026

What

Per-tier model access gates and billing multipliers across the full stack (API → NATS → daemon → Python runtime).

Why

Different models have wildly different costs. Without routing, any user on any tier could request Claude Opus and burn quota at 15x the rate. This enforces access and bills correctly.

How

  • packages/model — new TS model catalog: 3 Anthropic models + Ollama local, each with a minimum tier and billing multiplier
  • API layer validates the requested model against the caller's tier before dispatch, resolves system prompt from agent.config.systemPrompt
  • NATS job payload now carries model + systemPrompt through to the daemon
  • Python runtime routes by model prefix (ollama/* vs Anthropic), applies multiplier to token counts before returning
  • Fixed two daemon bugs: was calling /execute instead of /run, and RunOutput.payload didn't match Python's output_payload

Testing

  • Unit tests added or updated — 20 vitest tests in packages/model/src/catalog.test.ts, pytest routing tests in services/runtime/tests/
  • Integration tests added or updated (if applicable)
  • Tested locally against Docker stack

Checklist — check everything except integration tests (none added).

Summary by CodeRabbit

  • New Features
    • Model selection for agent runs with local (Ollama) vs cloud routing, tier-based access control, resolved fallbacks, per-model billing multipliers, configurable system prompts, and execution timeouts.
  • Tests
    • Added unit tests for model catalog and runner routing.
  • Chores
    • CI and test scripts updated to include the runtime service; lint-staged config replaced with a new module-based setup.
  • Bug Fixes
    • Runtime run output field renamed to output_payload (consumed by downstream services).

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 8, 2026

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Free

Run ID: 5de77176-259e-42fc-a322-01237f9775e8

📥 Commits

Reviewing files that changed from the base of the PR and between 3874db8 and 88fbee9.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (1)
  • packages/model/package.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/model/package.json

📝 Walkthrough

Walkthrough

Adds a new @maschina/model package for model cataloging and access; moves lint-staged config from JSON to ESM; propagates model and system_prompt through API, jobs, and daemon; renames RunOutput.payloadoutput_payload; Python runtime routes to Ollama or Anthropic with per-model billing multipliers; tests and CI updated.

Changes

Cohort / File(s) Summary
CI & Linting
/.github/workflows/ci.yml, .lintstagedrc.json, .lintstagedrc.mjs, package.json
Added packages/model to TS build steps; replaced deleted .lintstagedrc.json with .lintstagedrc.mjs that filters non-existent files; included services/runtime in Python pytest/CI scripts and added pytest:runtime-service.
Model package
packages/model/package.json, packages/model/tsconfig.json, packages/model/src/index.ts, packages/model/src/catalog.ts, packages/model/src/catalog.test.ts
New package exposing a model catalog, tier defaults, per-model multipliers, access/resolve helpers, unit tests, and build/publish metadata.
API & Validation
packages/validation/src/schemas/agent.ts, services/api/src/routes/agents.ts, services/api/package.json, services/api/Dockerfile
Added optional model to RunAgentSchema; API validates and resolves model for tier, computes systemPrompt and timeoutSecs, and passes model/systemPrompt/timeoutSecs onward; added workspace dependency and Docker build step for @maschina/model.
Jobs / Types
packages/jobs/src/types.ts, packages/jobs/src/dispatch.ts
Extended AgentExecuteJob and dispatchAgentRun to include model and systemPrompt, and forwarded them in dispatch payloads.
Daemon orchestration & runtime
services/daemon/src/orchestrator/..., services/daemon/src/runtime/mod.rs
Propagated model/system_prompt through job structs and queueing; renamed RunOutput.payloadoutput_payload; expanded RuntimeRequest with plan_tier, model, system_prompt, max_tokens, timeout_secs; changed runtime endpoint to /run.
Python runtime & tests
services/runtime/src/runner.py, services/runtime/tests/test_runner_routing.py
Runtime routes ollama/* to local OllamaRunner and other models to AnthropicRunner, applies per-model billing multipliers, reports billed tokens, lazy-inits cloud client, and adds unit tests for routing/multiplier helpers.
Daemon SQL mapping
services/daemon/src/orchestrator/analyze.rs
Updated persistence to use output_payload (renamed RunOutput field) when updating DB records.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant API as API Service
    participant Model as "@maschina/model"
    participant Jobs as Jobs Service
    participant Daemon as Daemon Orchestrator
    participant Runtime as Python Runtime

    Client->>API: POST /agents/:id/run (body: input, optional model)
    API->>Model: validateModelAccess(tier, model?)
    Model-->>API: {allowed, reason?}
    alt Access Denied
        API-->>Client: 403 Forbidden
    else Access Allowed
        API->>Model: resolveModel(tier, requested)
        Model-->>API: resolved_model_id
        API->>API: compute systemPrompt, timeoutSecs
        API->>Jobs: dispatchAgentRun(..., model: resolved_model_id, systemPrompt, timeoutSecs)
        Jobs->>Daemon: enqueue AgentExecuteJob(..., model, system_prompt, timeout_secs)
        Daemon->>Daemon: convert to JobToRun(..., model, system_prompt)
        Daemon->>Runtime: POST /run {plan_tier, model, system_prompt, max_tokens, timeout_secs, input_payload}
        alt Model is Ollama
            Runtime->>Runtime: OllamaRunner.execute(local)
            Runtime-->>Daemon: RunResponse(output_payload, billed_tokens)
        else Model is Anthropic
            Runtime->>Runtime: AnthropicRunner.execute(cloud) + apply multiplier
            Runtime-->>Daemon: RunResponse(output_payload, billed_tokens)
        end
        Daemon-->>Jobs: mark complete / persist output_payload
        Jobs-->>API: run result
        API-->>Client: execution result (model_used, output)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through catalogs, models in tow,
Ollama for local, cloud where winds blow,
Multipliers counted, prompts tucked just right,
From TypeScript maps to Python's night,
A rabbit's routing dance — soft and spry.


Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

Comment @coderabbitai help to get the list of available commands and usage tips.

@RustMunkey RustMunkey merged commit 215faef into main Mar 8, 2026
24 of 26 checks passed
@RustMunkey RustMunkey deleted the feat/model-routing branch March 15, 2026 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant