Skip to content

Commit 6b43355

Browse files
Sripadkarneclaude
andauthored
Update docs with latest benchmark results and blog post fixes (#78)
* Update docs with latest benchmark results and blog post fixes - benchmark-results/index.md: All tables updated with corrected numbers, added GPQA thinking effort ablation section - blog technical-deep-dive: Updated budget alternatives, algorithm comparison, selector summary, Opus+Opus fix, cumulative cost wording - mkdocs.yml: Minor config updates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add topology assumption tradeoff for Hill Climbing vs Arm Elimination Arm Elimination is assumption-free (uses only observed data), while Hill Climbing requires a hand-crafted model ranking upfront. Also split LM Proposal into its own paragraph. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add server latency column to HotpotQA and MathQA tables Added Avg Latency (s) column to all 6 tables (Top 15, Bottom 15, Full 81) for both 2-tuple benchmarks using server-side latency from cache.db results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix HotpotQA Bottom 15 rank offset (was off by 1 vs Full 81) Kimi + Haiku 4.5 is rank 66, not 67. Bottom 15 now correctly starts at rank 67 and matches the Full 81 table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Sort all selector comparison tables by Mean Accuracy descending Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent b989cf3 commit 6b43355

3 files changed

Lines changed: 334 additions & 314 deletions

File tree

0 commit comments

Comments
 (0)