LazyRouter is a lightweight OpenAI-compatible router that picks the best configured model for each request.
It is designed for simple operation: define providers/models in YAML, call model: "auto", and let the router choose.
In agentic workflows, context grows quickly and token usage gets expensive. Without smart routing, trivial prompts like "hi" or "hello" can still hit premium models (for example, Opus), which is not economical.
LazyRouter solves this by putting a cheap, fast router model in the middle as a gatekeeper:
- It chooses the right model for each request instead of always using the most expensive one.
- It reduces unnecessary spend in long-running agent sessions (especially OpenClaw-style workflows).
- It keeps a single OpenAI-compatible interface while handling provider differences behind the scenes.
It also helps translate behavior across API styles (OpenAI, Gemini, and Anthropic).
- OpenAI-compatible
/v1/chat/completionsendpoint - LLM-based routing without extra training pipelines
- Mixed provider support in one config (OpenAI, Anthropic, Gemini, OpenAI-compatible gateways)
- Useful as a cost-control gatekeeper for agent frameworks like OpenClaw
- Built-in compatibility handling between OpenAI, Gemini, and Anthropic styles
- Streaming and non-streaming response support
- Health and benchmark endpoints for operational visibility
- Automatic model fallback on rate limits or errors (tries ELO-similar models)
- Exponential backoff retry when all models are temporarily unavailable
- Install
uv: https://docs.astral.sh/uv/getting-started/installation/ - Start LazyRouter directly from GitHub:
uvx --from git+https://github.com/mysteriousHerb/lazyrouter.git lazyrouter --config config.yamlIf config.yaml does not exist yet, LazyRouter now starts a setup UI instead of failing. Open the printed /admin/config URL, paste/edit your config.yaml and .env, save, then use the restart button to apply the config.
You do not need to manually prepare config.yaml or .env first anymore. The setup UI can create both files for you on first run.
By default, --config config.yaml means:
config.yamlis saved in the directory where you ranuvx.envis saved next to that config file- both files are normal on-disk files, so they persist across future
uvxruns
For example, if you launch from /home/alice/projects/demo, the setup UI will write:
/home/alice/projects/demo/config.yaml/home/alice/projects/demo/.env
If you launch from a different folder later, LazyRouter will look in that folder unless you pass an explicit --config path.
If you want to use a specific env file path, add:
uvx --from git+https://github.com/mysteriousHerb/lazyrouter.git lazyrouter --config config.yaml --env-file .env- Install
uv: https://docs.astral.sh/uv/getting-started/installation/ - Clone the repo and install dependencies:
git clone https://github.com/mysteriousHerb/lazyrouter
cd lazyrouter
uv sync- Start the server:
uv run python main.py --config config.yaml- If
config.yamlis missing, openhttp://localhost:8000/admin/config(or the printed host/port) and paste/edit yourconfig.yamland.env, then save and restart from the UI. - Send requests to
http://localhost:1234/v1/chat/completions.
Use config.example.yaml as the base. API keys are loaded from .env.
coding_elo/writing_eloinllmsare quality signals you can source fromhttps://arena.ai/leaderboard.context_compressioncontrols how aggressively old history is trimmed to keep token usage/cost under control during long agent runs.
If you want a very fast router with minimal added latency, a strong option is:
router:
provider: groq
model: "openai/gpt-oss-120b"This can work well as a low-latency routing model when your groq provider is configured in providers.
You can override the default routing prompt by adding a prompt field in the router section of your config:
router:
provider: gemini
model: "gemini-2.5-flash"
prompt: |
You are a model router. Select the best model for the user's request.
If the user explicitly requests a specific model, honor that request.
Available models: {model_descriptions}
Context: {context}
Current request: {current_request}
Respond with reasoning and model choice.The prompt must include these placeholders: {model_descriptions}, {context}, and {current_request}.
The default routing prompt now supports explicit model requests from users. You can say things like:
- "Use opus for this task"
- "Route to gemini-2.5-pro"
- "Switch to claude-sonnet"
The router will honor these explicit requests and route to the specified model.
LazyRouter now exposes a browser-based config editor at /admin/config.
- If no config file exists, LazyRouter boots into setup mode and serves the editor immediately.
- If
serve.api_keyis configured, the admin UI uses the browser's built-in Basic auth popup and checks that password againstserve.api_key. - The UI supports raw
config.yamland.envediting for a low-friction V1 workflow. - Validation uses the same backend parser and Pydantic config models as normal startup.
- Saved changes are written to disk atomically and require a restart before the running router picks them up.
- Relative config paths are resolved from your current working directory, including when launched via
uvx.
- Exchange logs include both the incoming request payload and an effective post-normalization payload when available (
request_effectivein JSONL entries). - Set
LAZYROUTER_LOG_MESSAGE_CONTENT=0to redact allcontentfields in logged request/response payloads while keeping structural metadata for debugging.
When serve.show_model_prefix is enabled, LazyRouter strips known [model-name] prefixes from assistant history before upstream calls. This works for plain string content and assistant content part lists (for multimodal/tool-use style messages).
Edit your .openclaw/openclaw.json and add a LazyRouter provider under models.providers:
{
"models": {
"providers": {
"lazyrouter": {
"baseUrl": "http://server-address:port/v1",
"apiKey": "not-needed",
"api": "openai-completions",
"models": []
}
}
}
}Then set your agent primary model to:
"agents": {
"defaults": {
"model": {
"primary": "lazyrouter/auto"
},
}
}curl -X POST http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Explain vector databases briefly"}]
}'GET /health- Liveness checkGET /v1/models- List available modelsGET /v1/health-status- Show cached health check resultsGET /v1/health-check- Run health check now and return resultsPOST /v1/chat/completions- OpenAI-compatible chat endpoint
LazyRouter uses a lightweight LLM-based routing architecture:
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Client Request │────▶│ Router Model │────▶│ Context Trimming │────▶│ LLM Provider │
│ (model: auto) │ │ (cheap & fast) │ │ (token control) │ │ (via LiteLLM) │
└─────────────────┘ └──────────────────┘ └──────────────────┘ └─────────────────┘
│ │
│ selects best model │
▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ OpenAI/Anthropic │ │ Response │
│ Gemini/Custom │ │ to Client │
└──────────────────┘ └─────────────────┘
Key components:
-
LLMRouter (
router.py): Uses a cheap/fast model (e.g., GPT-4o-mini, Gemini Flash) to analyze requests and select the optimal model based on Elo ratings, pricing, and task complexity. Returns structured JSON with reasoning. -
FastAPI Server (
server.py): OpenAI-compatible/v1/chat/completionsendpoint with streaming support. Handles provider-specific message sanitization for Gemini/Anthropic. -
Context Compression (
context_compressor.py): Trims conversation history to control token usage in long agent sessions. Configurable viamax_history_tokensandkeep_recent_exchanges. -
Health Checker (
health_checker.py): Background task that periodically pings models and excludes unhealthy ones from routing decisions. -
Retry Handler (
retry_handler.py): Automatic fallback to ELO-similar models on rate limits or errors. Exponential backoff retry when all models fail, tied to health check interval. -
Tool Cache (
tool_cache.py): Caches tool call IDs to model mappings per session, enabling router bypass on tool continuations for lower latency. -
LiteLLM Integration: All provider calls go through LiteLLM with
drop_params=Truefor automatic compatibility handling across OpenAI, Anthropic, and Gemini APIs.
uv run python tests/test_setup.py
uv run pytest -qdocs/README.md(docs index)docs/QUICKSTART.mddocs/API_STYLES.mddocs/QUICKSTART_API_STYLES.mddocs/UV_GUIDE.md
GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007
