AI Model Benchmarks 2026

Structured, machine-readable AI model data for agents, pipelines, and developers. Benchmarks, pricing, capabilities, routing — auto-updated daily.

🇷🇺 Русская версия

119 models · 54 benchmarks · 26 embeddings · 97 capability profiles · auto-updated daily

Browse the portal · View the data · Methodology

Who this is for

AI agents and pipelines — structured JSON for automated model routing, cost optimization, and capability matching
Developers — "which model for my RAG / code review / classification task at my budget?"
AI tools (LangChain, LiteLLM, etc.) — ready-to-integrate routing tables, capability flags, and pricing data

The problem

Benchmark tables go stale the day they're published. Prices change weekly. New models appear monthly. Capabilities aren't tracked anywhere in structured form. Most comparisons show a single score per model with no date, no source link, and no way to know if the number is still valid.

What this solves

Per-score freshness dates — every benchmark value has a measured date and a source URL. You can verify any number yourself
Capability profiles — vision, tool calling, reasoning, structured output, max tokens — synced daily from OpenRouter API (97/119 models)
Auto-updated pricing — fetched daily from OpenRouter API via GitHub Actions (see the workflow)
Task routing — not just "best model" but "best model for your specific task at your budget" (25 task categories, 4 tiers: quality, budget, free)
Benchmark lifecycle — which benchmarks still separate models (active) vs which are noise (saturated/dead/contaminated)
No self-reported scores — all data from 16 independent sources (Scale AI SEAL, Artificial Analysis, BFCL, LM Arena, LiveBench...)

Why trust this data

Every score is auditable:

Source URL — click through to the original leaderboard or paper
Measured date — know exactly when the score was collected
Automated validation — scripts/validate.py enforces schema, catches missing sources, flags stale scores
Git history — every change is tracked, every update has a commit
CI pipeline — daily price updates + PR validation via GitHub Actions

Quick start

Browse the data: data/models.json, data/embeddings.json, data/benchmarks.json
Find the right model: data/routing.json — KING picks by category, quick decision matrix
Understand the methodology: docs/METHODOLOGY.md

For agents and pipelines

Every model in data/models.json includes structured fields your code can consume directly:

{
  "id": "claude-opus-4-6",
  "context_length": 1000000,
  "max_output_tokens": 128000,
  "pricing": { "input": 5.0, "output": 25.0 },
  "capabilities": {
    "vision": true,
    "tool_calling": true,
    "reasoning": true,
    "structured_output": true,
    "json_mode": true
  },
  "scores": {
    "swe_v": { "value": 80.8, "measured": "2026-03", "source": "https://..." }
  }
}

Use data/routing.json → quick_matrix for automated model selection:

# Example: pick best model for a task
routing = json.load(open("data/routing.json"))
for entry in routing["quick_matrix"]:
    if entry["task"] == "Write/review code":
        print(entry["use"], entry["backup"], entry["free"])

Data structure

data/
  models.json              — 119 models: scores, pricing, capabilities, metadata
  manual_capabilities.json — 15 top models: knowledge cutoff, caching, effective context
  embeddings.json          — 26 embedding models with MTEB scores and use-case routing
  benchmarks.json          — 54 benchmarks with lifecycle status
  routing.json             — Task routing: KING picks, FREE routing, quick decision matrix
  pricing.json             — Cache pricing by provider (auto-updated daily)

scripts/
  sync_capabilities.py         — Daily: pull capabilities from OpenRouter API
  fetch_openrouter_prices.py   — Daily: pull pricing from OpenRouter API
  generate_portal.py           — Daily: regenerate portal HTML from JSON
  validate.py                  — CI: schema validation, source URLs, freshness, routing refs
  generate_md.py               — Generate markdown tables from JSON data

How to use

Find the right model for a task

Check data/routing.json, section quick_matrix. Example entries:

Task	Use	Backup	Free
Write/review code	Claude Sonnet 4.6	MiniMax M2.5	MiniMax M2.5 FREE
Complex reasoning	Claude Opus 4.6	Gemini 3.1 Pro	Gemini CLI
Batch classification	Qwen CLI	MiniMax M2.5	Both free
Long document (>200K)	Gemini 3.1 Pro (1M)	MiniMax 01 (1M)	Gemini CLI (2M)
EU compliance	Mistral Large	Mistral Medium	—

Full routing table: data/routing.json → quick_matrix (25 task categories).

# Generate the full markdown reference
python scripts/generate_md.py > MODEL_BENCHMARKS.md

Stay current on pricing

Pricing is updated by running the fetch script:

python scripts/fetch_openrouter_prices.py

Validate data integrity

python scripts/validate.py

How to contribute

Pull requests welcome. Requirements:

Source link required — every score must have a source URL pointing to the leaderboard or paper
Measured date required — use YYYY-MM format minimum, YYYY-MM-DD preferred
No self-reported scores — scores must come from independent benchmarking (not the model provider's own blog)
Run python scripts/validate.py before submitting — PRs should pass validation before submission

See docs/CONTRIBUTING.md for full guidelines.

Update schedule

Daily (06:00 UTC): Prices fetched from OpenRouter API via GitHub Action
On every PR: Schema validation via GitHub Action
Weekly (manual): Benchmark scores reviewed against leaderboards, freshness dates updated

Sources

Scores come from: LM Council, Artificial Analysis, Scale AI SEAL, BFCL V4, BenchLM.ai, RankSaga/Kaggle, Z.ai, MiniMax, OpenRouter official, AIModelsMap, Awesome Agents, MedQA, VALS.ai, FDA, PricePerToken, MTEB Leaderboard, Prem.ai, Mixpeek.

See docs/METHODOLOGY.md for full source list and trust hierarchy.

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github		.github
data		data
docs		docs
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
README_ru.md		README_ru.md
index.html		index.html
llms-full.txt		llms-full.txt
llms.txt		llms.txt
requirements.txt		requirements.txt
robots.txt		robots.txt
sitemap.xml		sitemap.xml
state.md		state.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Model Benchmarks 2026

Who this is for

The problem

What this solves

Why trust this data

Quick start

For agents and pipelines

Data structure

How to use

Find the right model for a task

Stay current on pricing

Validate data integrity

How to contribute

Update schedule

Sources

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Model Benchmarks 2026

Who this is for

The problem

What this solves

Why trust this data

Quick start

For agents and pipelines

Data structure

How to use

Find the right model for a task

Stay current on pricing

Validate data integrity

How to contribute

Update schedule

Sources

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages