Skip to content

Commit b8f25b6

Browse files
sjarmakclaude
andcommitted
docs: regenerate official results browser pages
Refreshed all 1127 audit, run, and suite pages in docs/official_results/ via export_official_results.py to reflect ccb_feature/ccb_refactor split and latest promoted run data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent d75b1af commit b8f25b6

File tree

2,312 files changed

+1875899
-83239
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,312 files changed

+1875899
-83239
lines changed

docs/official_results/README.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This bundle is generated from `runs/official/` and includes only valid scored tasks (`passed`/`failed` with numeric reward).
44

5-
Generated: `2026-02-27T13:52:05.485254+00:00`
5+
Generated: `2026-02-28T20:57:07.600629+00:00`
66

77
## Local Browse
88

@@ -17,16 +17,16 @@ Historical reruns/backfills remain available in `data/official_results.json` und
1717

1818
| Suite | Config | Valid Tasks | Min Required | Mean Reward | Pass Rate | Coverage |
1919
|---|---|---:|---:|---:|---:|---|
20-
| [ccb_build](suites/ccb_build.md) | `baseline-local-direct` | 25 | 25 | 0.513 | 0.800 | ok |
21-
| [ccb_build](suites/ccb_build.md) | `mcp-remote-direct` | 25 | 25 | 0.372 | 0.640 | ok |
20+
| [ccb_build](suites/ccb_build.md) | `baseline-local-direct` | 24 | 25 | 0.534 | 0.833 | FLAG: below minimum |
21+
| [ccb_build](suites/ccb_build.md) | `mcp-remote-direct` | 24 | 25 | 0.388 | 0.667 | FLAG: below minimum |
2222
| [ccb_debug](suites/ccb_debug.md) | `baseline-local-direct` | 20 | 20 | 0.670 | 1.000 | ok |
2323
| [ccb_debug](suites/ccb_debug.md) | `mcp-remote-direct` | 20 | 20 | 0.487 | 0.600 | ok |
2424
| [ccb_design](suites/ccb_design.md) | `baseline-local-direct` | 20 | 20 | 0.753 | 0.950 | ok |
2525
| [ccb_design](suites/ccb_design.md) | `mcp-remote-direct` | 20 | 20 | 0.718 | 1.000 | ok |
2626
| [ccb_document](suites/ccb_document.md) | `baseline-local-direct` | 20 | 20 | 0.847 | 1.000 | ok |
2727
| [ccb_document](suites/ccb_document.md) | `mcp-remote-direct` | 25 | 20 | 0.802 | 1.000 | ok |
28-
| [ccb_fix](suites/ccb_fix.md) | `baseline-local-direct` | 28 | 25 | 0.428 | 0.571 | ok |
29-
| [ccb_fix](suites/ccb_fix.md) | `mcp-remote-direct` | 28 | 25 | 0.467 | 0.571 | ok |
28+
| [ccb_fix](suites/ccb_fix.md) | `baseline-local-direct` | 28 | 25 | 0.421 | 0.571 | ok |
29+
| [ccb_fix](suites/ccb_fix.md) | `mcp-remote-direct` | 53 | 25 | 0.526 | 0.642 | ok |
3030
| [ccb_mcp_compliance](suites/ccb_mcp_compliance.md) | `baseline-local-artifact` | 1 | 21 | 0.375 | 1.000 | FLAG: below minimum |
3131
| [ccb_mcp_compliance](suites/ccb_mcp_compliance.md) | `baseline-local-direct` | 6 | 21 | 0.668 | 1.000 | FLAG: below minimum |
3232
| [ccb_mcp_compliance](suites/ccb_mcp_compliance.md) | `mcp-remote-artifact` | 1 | 21 | 0.742 | 1.000 | FLAG: below minimum |
@@ -78,10 +78,10 @@ Historical reruns/backfills remain available in `data/official_results.json` und
7878

7979
| Run | Suite | Config | Valid Tasks | Mean Reward | Pass Rate |
8080
|---|---|---|---:|---:|---:|
81-
| [build_haiku_20260223_124805](runs/build_haiku_20260223_124805.md) | `ccb_build` | `baseline-local-direct` | 19 | 0.511 | 0.789 |
82-
| [build_haiku_20260223_124805](runs/build_haiku_20260223_124805.md) | `ccb_build` | `mcp-remote-direct` | 25 | 0.372 | 0.640 |
83-
| [ccb_build_haiku_022326](runs/ccb_build_haiku_022326.md) | `ccb_build` | `baseline-local-direct` | 19 | 0.511 | 0.789 |
84-
| [ccb_build_haiku_022326](runs/ccb_build_haiku_022326.md) | `ccb_build` | `mcp-remote-direct` | 25 | 0.372 | 0.640 |
81+
| [build_haiku_20260223_124805](runs/build_haiku_20260223_124805.md) | `ccb_build` | `baseline-local-direct` | 18 | 0.540 | 0.833 |
82+
| [build_haiku_20260223_124805](runs/build_haiku_20260223_124805.md) | `ccb_build` | `mcp-remote-direct` | 24 | 0.388 | 0.667 |
83+
| [ccb_build_haiku_022326](runs/ccb_build_haiku_022326.md) | `ccb_build` | `baseline-local-direct` | 18 | 0.540 | 0.833 |
84+
| [ccb_build_haiku_022326](runs/ccb_build_haiku_022326.md) | `ccb_build` | `mcp-remote-direct` | 24 | 0.388 | 0.667 |
8585
| [ccb_build_haiku_20260225_234223](runs/ccb_build_haiku_20260225_234223.md) | `ccb_build` | `baseline-local-direct` | 1 | 0.820 | 1.000 |
8686
| [ccb_build_haiku_20260226_015500_backfill](runs/ccb_build_haiku_20260226_015500_backfill.md) | `ccb_build` | `baseline-local-direct` | 1 | 0.820 | 1.000 |
8787
| [ccb_build_haiku_20260227_baseline_gapfill](runs/ccb_build_haiku_20260227_baseline_gapfill.md) | `ccb_build` | `baseline-local-direct` | 5 | 0.456 | 0.800 |
@@ -102,6 +102,9 @@ Historical reruns/backfills remain available in `data/official_results.json` und
102102
| [ccb_fix_haiku_20260224_203138](runs/ccb_fix_haiku_20260224_203138.md) | `ccb_fix` | `mcp-remote-direct` | 1 | 0.740 | 1.000 |
103103
| [ccb_fix_haiku_20260226_015500_backfill](runs/ccb_fix_haiku_20260226_015500_backfill.md) | `ccb_fix` | `baseline-local-direct` | 2 | 0.235 | 0.500 |
104104
| [ccb_fix_haiku_20260226_015500_backfill](runs/ccb_fix_haiku_20260226_015500_backfill.md) | `ccb_fix` | `mcp-remote-direct` | 1 | 0.667 | 1.000 |
105+
| [ccb_fix_haiku_20260227_151833](runs/ccb_fix_haiku_20260227_151833.md) | `ccb_fix` | `baseline-local-direct` | 1 | 0.000 | 0.000 |
106+
| [ccb_fix_haiku_20260228_185835](runs/ccb_fix_haiku_20260228_185835.md) | `ccb_fix` | `baseline-local-direct` | 25 | 0.471 | 0.640 |
107+
| [ccb_fix_haiku_20260228_185835](runs/ccb_fix_haiku_20260228_185835.md) | `ccb_fix` | `mcp-remote-direct` | 25 | 0.592 | 0.720 |
105108
| [ccb_mcp_compliance_haiku_20260224_181919](runs/ccb_mcp_compliance_haiku_20260224_181919.md) | `ccb_mcp_compliance` | `mcp-remote-artifact` | 1 | 0.742 | 1.000 |
106109
| [ccb_mcp_compliance_haiku_20260225_011700](runs/ccb_mcp_compliance_haiku_20260225_011700.md) | `ccb_mcp_compliance` | `baseline-local-artifact` | 1 | 0.375 | 1.000 |
107110
| [ccb_mcp_compliance_haiku_20260226_035515_variance](runs/ccb_mcp_compliance_haiku_20260226_035515_variance.md) | `ccb_mcp_compliance` | `baseline-local-direct` | 1 | 0.386 | 1.000 |

0 commit comments

Comments
 (0)