Structured, machine-readable AI model data for agents, pipelines, and developers. Benchmarks, pricing, capabilities, routing — auto-updated daily.
119 models · 54 benchmarks · 26 embeddings · 97 capability profiles · auto-updated daily
Browse the portal · View the data · Methodology
- AI agents and pipelines — structured JSON for automated model routing, cost optimization, and capability matching
- Developers — "which model for my RAG / code review / classification task at my budget?"
- AI tools (LangChain, LiteLLM, etc.) — ready-to-integrate routing tables, capability flags, and pricing data
Benchmark tables go stale the day they're published. Prices change weekly. New models appear monthly. Capabilities aren't tracked anywhere in structured form. Most comparisons show a single score per model with no date, no source link, and no way to know if the number is still valid.
- Per-score freshness dates — every benchmark value has a
measureddate and asourceURL. You can verify any number yourself - Capability profiles — vision, tool calling, reasoning, structured output, max tokens — synced daily from OpenRouter API (97/119 models)
- Auto-updated pricing — fetched daily from OpenRouter API via GitHub Actions (see the workflow)
- Task routing — not just "best model" but "best model for your specific task at your budget" (25 task categories, 4 tiers: quality, budget, free)
- Benchmark lifecycle — which benchmarks still separate models (active) vs which are noise (saturated/dead/contaminated)
- No self-reported scores — all data from 16 independent sources (Scale AI SEAL, Artificial Analysis, BFCL, LM Arena, LiveBench...)
Every score is auditable:
- Source URL — click through to the original leaderboard or paper
- Measured date — know exactly when the score was collected
- Automated validation —
scripts/validate.pyenforces schema, catches missing sources, flags stale scores - Git history — every change is tracked, every update has a commit
- CI pipeline — daily price updates + PR validation via GitHub Actions
- Browse the data:
data/models.json,data/embeddings.json,data/benchmarks.json - Find the right model:
data/routing.json— KING picks by category, quick decision matrix - Understand the methodology:
docs/METHODOLOGY.md
Every model in data/models.json includes structured fields your code can consume directly:
{
"id": "claude-opus-4-6",
"context_length": 1000000,
"max_output_tokens": 128000,
"pricing": { "input": 5.0, "output": 25.0 },
"capabilities": {
"vision": true,
"tool_calling": true,
"reasoning": true,
"structured_output": true,
"json_mode": true
},
"scores": {
"swe_v": { "value": 80.8, "measured": "2026-03", "source": "https://..." }
}
}Use data/routing.json → quick_matrix for automated model selection:
# Example: pick best model for a task
routing = json.load(open("data/routing.json"))
for entry in routing["quick_matrix"]:
if entry["task"] == "Write/review code":
print(entry["use"], entry["backup"], entry["free"])data/
models.json — 119 models: scores, pricing, capabilities, metadata
manual_capabilities.json — 15 top models: knowledge cutoff, caching, effective context
embeddings.json — 26 embedding models with MTEB scores and use-case routing
benchmarks.json — 54 benchmarks with lifecycle status
routing.json — Task routing: KING picks, FREE routing, quick decision matrix
pricing.json — Cache pricing by provider (auto-updated daily)
scripts/
sync_capabilities.py — Daily: pull capabilities from OpenRouter API
fetch_openrouter_prices.py — Daily: pull pricing from OpenRouter API
generate_portal.py — Daily: regenerate portal HTML from JSON
validate.py — CI: schema validation, source URLs, freshness, routing refs
generate_md.py — Generate markdown tables from JSON data
Check data/routing.json, section quick_matrix. Example entries:
| Task | Use | Backup | Free |
|---|---|---|---|
| Write/review code | Claude Sonnet 4.6 | MiniMax M2.5 | MiniMax M2.5 FREE |
| Complex reasoning | Claude Opus 4.6 | Gemini 3.1 Pro | Gemini CLI |
| Batch classification | Qwen CLI | MiniMax M2.5 | Both free |
| Long document (>200K) | Gemini 3.1 Pro (1M) | MiniMax 01 (1M) | Gemini CLI (2M) |
| EU compliance | Mistral Large | Mistral Medium | — |
Full routing table: data/routing.json → quick_matrix (25 task categories).
# Generate the full markdown reference
python scripts/generate_md.py > MODEL_BENCHMARKS.mdPricing is updated by running the fetch script:
python scripts/fetch_openrouter_prices.pypython scripts/validate.pyPull requests welcome. Requirements:
- Source link required — every score must have a
sourceURL pointing to the leaderboard or paper - Measured date required — use
YYYY-MMformat minimum,YYYY-MM-DDpreferred - No self-reported scores — scores must come from independent benchmarking (not the model provider's own blog)
- Run
python scripts/validate.pybefore submitting — PRs should pass validation before submission
See docs/CONTRIBUTING.md for full guidelines.
- Daily (06:00 UTC): Prices fetched from OpenRouter API via GitHub Action
- On every PR: Schema validation via GitHub Action
- Weekly (manual): Benchmark scores reviewed against leaderboards, freshness dates updated
Scores come from: LM Council, Artificial Analysis, Scale AI SEAL, BFCL V4, BenchLM.ai, RankSaga/Kaggle, Z.ai, MiniMax, OpenRouter official, AIModelsMap, Awesome Agents, MedQA, VALS.ai, FDA, PricePerToken, MTEB Leaderboard, Prem.ai, Mixpeek.
See docs/METHODOLOGY.md for full source list and trust hierarchy.
MIT — see LICENSE