feat(model-routing): per-tier model access gates and billing multipliers by RustMunkey · Pull Request #31 · RustMunkey/maschina

RustMunkey · 2026-03-08T20:27:58Z

What

Per-tier model access gates and billing multipliers across the full stack (API → NATS → daemon → Python runtime).

Why

Different models have wildly different costs. Without routing, any user on any tier could request Claude Opus and burn quota at 15x the rate. This enforces access and bills correctly.

How

packages/model — new TS model catalog: 3 Anthropic models + Ollama local, each with a minimum tier and billing multiplier
API layer validates the requested model against the caller's tier before dispatch, resolves system prompt from agent.config.systemPrompt
NATS job payload now carries model + systemPrompt through to the daemon
Python runtime routes by model prefix (ollama/* vs Anthropic), applies multiplier to token counts before returning
Fixed two daemon bugs: was calling /execute instead of /run, and RunOutput.payload didn't match Python's output_payload

Testing

Unit tests added or updated — 20 vitest tests in packages/model/src/catalog.test.ts, pytest routing tests in services/runtime/tests/
Integration tests added or updated (if applicable)
Tested locally against Docker stack

Checklist — check everything except integration tests (none added).

Summary by CodeRabbit

New Features
- Model selection for agent runs with local (Ollama) vs cloud routing, tier-based access control, resolved fallbacks, per-model billing multipliers, configurable system prompts, and execution timeouts.
Tests
- Added unit tests for model catalog and runner routing.
Chores
- CI and test scripts updated to include the runtime service; lint-staged config replaced with a new module-based setup.
Bug Fixes
- Runtime run output field renamed to output_payload (consumed by downstream services).

coderabbitai · 2026-03-08T20:28:04Z

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Free

Run ID: 5de77176-259e-42fc-a322-01237f9775e8

📥 Commits

Reviewing files that changed from the base of the PR and between 3874db8 and 88fbee9.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (1)

packages/model/package.json

🚧 Files skipped from review as they are similar to previous changes (1)

packages/model/package.json

📝 Walkthrough

Walkthrough

Adds a new @maschina/model package for model cataloging and access; moves lint-staged config from JSON to ESM; propagates model and system_prompt through API, jobs, and daemon; renames RunOutput.payload→output_payload; Python runtime routes to Ollama or Anthropic with per-model billing multipliers; tests and CI updated.

Changes

Cohort / File(s)	Summary
CI & Linting `/.github/workflows/ci.yml`, `.lintstagedrc.json`, `.lintstagedrc.mjs`, `package.json`	Added `packages/model` to TS build steps; replaced deleted `.lintstagedrc.json` with `.lintstagedrc.mjs` that filters non-existent files; included `services/runtime` in Python pytest/CI scripts and added `pytest:runtime-service`.
Model package `packages/model/package.json`, `packages/model/tsconfig.json`, `packages/model/src/index.ts`, `packages/model/src/catalog.ts`, `packages/model/src/catalog.test.ts`	New package exposing a model catalog, tier defaults, per-model multipliers, access/resolve helpers, unit tests, and build/publish metadata.
API & Validation `packages/validation/src/schemas/agent.ts`, `services/api/src/routes/agents.ts`, `services/api/package.json`, `services/api/Dockerfile`	Added optional `model` to RunAgentSchema; API validates and resolves model for tier, computes `systemPrompt` and `timeoutSecs`, and passes `model`/`systemPrompt`/`timeoutSecs` onward; added workspace dependency and Docker build step for `@maschina/model`.
Jobs / Types `packages/jobs/src/types.ts`, `packages/jobs/src/dispatch.ts`	Extended `AgentExecuteJob` and `dispatchAgentRun` to include `model` and `systemPrompt`, and forwarded them in dispatch payloads.
Daemon orchestration & runtime `services/daemon/src/orchestrator/...`, `services/daemon/src/runtime/mod.rs`	Propagated `model`/`system_prompt` through job structs and queueing; renamed `RunOutput.payload`→`output_payload`; expanded `RuntimeRequest` with `plan_tier`, `model`, `system_prompt`, `max_tokens`, `timeout_secs`; changed runtime endpoint to `/run`.
Python runtime & tests `services/runtime/src/runner.py`, `services/runtime/tests/test_runner_routing.py`	Runtime routes `ollama/*` to local OllamaRunner and other models to AnthropicRunner, applies per-model billing multipliers, reports billed tokens, lazy-inits cloud client, and adds unit tests for routing/multiplier helpers.
Daemon SQL mapping `services/daemon/src/orchestrator/analyze.rs`	Updated persistence to use `output_payload` (renamed RunOutput field) when updating DB records.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant API as API Service
    participant Model as "@maschina/model"
    participant Jobs as Jobs Service
    participant Daemon as Daemon Orchestrator
    participant Runtime as Python Runtime

    Client->>API: POST /agents/:id/run (body: input, optional model)
    API->>Model: validateModelAccess(tier, model?)
    Model-->>API: {allowed, reason?}
    alt Access Denied
        API-->>Client: 403 Forbidden
    else Access Allowed
        API->>Model: resolveModel(tier, requested)
        Model-->>API: resolved_model_id
        API->>API: compute systemPrompt, timeoutSecs
        API->>Jobs: dispatchAgentRun(..., model: resolved_model_id, systemPrompt, timeoutSecs)
        Jobs->>Daemon: enqueue AgentExecuteJob(..., model, system_prompt, timeout_secs)
        Daemon->>Daemon: convert to JobToRun(..., model, system_prompt)
        Daemon->>Runtime: POST /run {plan_tier, model, system_prompt, max_tokens, timeout_secs, input_payload}
        alt Model is Ollama
            Runtime->>Runtime: OllamaRunner.execute(local)
            Runtime-->>Daemon: RunResponse(output_payload, billed_tokens)
        else Model is Anthropic
            Runtime->>Runtime: AnthropicRunner.execute(cloud) + apply multiplier
            Runtime-->>Daemon: RunResponse(output_payload, billed_tokens)
        end
        Daemon-->>Jobs: mark complete / persist output_payload
        Jobs-->>API: run result
        API-->>Client: execution result (model_used, output)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through catalogs, models in tow,
Ollama for local, cloud where winds blow,
Multipliers counted, prompts tucked just right,
From TypeScript maps to Python's night,
A rabbit's routing dance — soft and spry.

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

feat(model-routing): per-tier model access gates and billing multipliers

996f633

RustMunkey added 3 commits March 8, 2026 14:33

fix(ci): update lockfile and fix runtime test package stubs

d866b9c

fix(ci): build @maschina/model before typecheck and vitest jobs

3874db8

fix(model): add @maschina/tsconfig devDependency

88fbee9

RustMunkey merged commit 215faef into main Mar 8, 2026
24 of 26 checks passed

RustMunkey deleted the feat/model-routing branch March 15, 2026 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(model-routing): per-tier model access gates and billing multipliers#31

feat(model-routing): per-tier model access gates and billing multipliers#31
RustMunkey merged 4 commits intomainfrom
feat/model-routing

RustMunkey commented Mar 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 8, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RustMunkey commented Mar 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RustMunkey commented Mar 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 8, 2026 •

edited

Loading