Commit 6b43355

and

authored

Update docs with latest benchmark results and blog post fixes (#78)

* Update docs with latest benchmark results and blog post fixes - benchmark-results/index.md: All tables updated with corrected numbers, added GPQA thinking effort ablation section - blog technical-deep-dive: Updated budget alternatives, algorithm comparison, selector summary, Opus+Opus fix, cumulative cost wording - mkdocs.yml: Minor config updates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add topology assumption tradeoff for Hill Climbing vs Arm Elimination Arm Elimination is assumption-free (uses only observed data), while Hill Climbing requires a hand-crafted model ranking upfront. Also split LM Proposal into its own paragraph. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add server latency column to HotpotQA and MathQA tables Added Avg Latency (s) column to all 6 tables (Top 15, Bottom 15, Full 81) for both 2-tuple benchmarks using server-side latency from cache.db results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix HotpotQA Bottom 15 rank offset (was off by 1 vs Full 81) Kimi + Haiku 4.5 is rank 66, not 67. Bottom 15 now correctly starts at rank 67 and matches the Full 81 table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Sort all selector comparison tables by Mean Accuracy descending Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

1 parent b989cf3 commit 6b43355Copy full SHA for 6b43355

3 files changed

docs
- benchmark-results
  - index.md
- blog/posts
  - technical-deep-dive.md
mkdocs.yml

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 6b43355

File tree

0 commit comments