Update README and docs for HN launch by waleedkadous · Pull Request #557 · cluesmith/codev

waleedkadous · 2026-02-25T11:12:11Z

Closes #556

Summary

Hero blockquote: Replaced R1 comparison scores (92-95 vs 12-15 out of 100) with real-world production data (106 PRs in 14 days, 57min median feature) and R4 controlled comparison results
VIBE vs SPIR comparison: Updated the expandable analysis from R1 data (where VIBE scored 0% functionality) to R4 data (where both approaches produced working code — methodologically stronger). Shows consistent +1.2 quality advantage across 4 rounds on a 1-10 scale scored by 3 independent AI agents
Eating Our Own Dog Food: Added production metrics table from the Feb 2026 development analysis (106 PRs, 801 commits, 85% autonomous builders, 20 pre-merge bugs caught, $1.59 consultation cost/PR)
VIBE description: Fixed inaccurate claim of "working code" — R1 VIBE produced boilerplate only with 0% functionality
Table of contents: Fixed duplicate Quick Start entry, added Real-World Performance link

Linked docs reviewed (no changes needed)

All 5 linked docs were reviewed for currency:

docs/faq.md — current, accurate comparison of Codev concepts vs Claude Code concepts
docs/tips.md — current, all CLI examples and tips are accurate
docs/why.md — historical article about Codev's origin; model references (Opus 4.1) are accurate for when the experiment was run
codev/resources/cheatsheet.md — current, protocol/role/tool tables accurate
codev/resources/commands/overview.md — current, all command summaries accurate

Key decisions

R4 over R1 for primary comparison: R1 was dramatic (VIBE scored 0-15 on everything) but R4 is more credible for an HN audience since both approaches produced working code. R4's +1.2 delta is more defensible than R1's catastrophic VIBE failure.
Production data in hero: The development analysis numbers (106 PRs, 57min median) are more impressive to a technical audience than controlled experiment scores. Led with these.
Kept linked docs unchanged: All linked docs are current. docs/why.md is a historical article — updating model names would misrepresent the original experiment.

Test plan

Verify all internal links resolve (TOC anchors, doc links, resource file links)
Verify external links to GitHub repos are accessible
Confirm the expandable comparison section renders correctly on GitHub
Review production metrics table against source data in codev/resources/development-analysis-2026-02-17.md

- Replace R1 comparison data (92-95 vs 12-15) with R4 results (7.0 vs 5.8 on 1-10 scale, consistent +1.2 across 4 rounds) - Add production metrics table from Feb 2026 development analysis: 106 PRs in 14 days, 57min median, 85% autonomous, 20 bugs caught - Fix inaccurate VIBE description (claimed "working code" when R1 produced 0% functionality) - Fix duplicate Quick Start entry in table of contents - Link hero blockquote to both case study and production data sections

waleedkadous · 2026-02-25T11:12:43Z

Architect Review

Low risk — README only, +43/-35. Clean update:

Results banner now leads with real production data (106 PRs/14 days) instead of R1-only scores
R4 comparison table is much more credible than the old 0-vs-100 scoring
Production metrics section with hard numbers is compelling for HN
Good trim of filler content

Approved.

Architect review

waleedkadous merged commit b228539 into main Feb 25, 2026
6 checks passed

waleedkadous mentioned this pull request Feb 25, 2026

Comprehensive documentation update for HN launch #558

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README and docs for HN launch#557

Update README and docs for HN launch#557
waleedkadous merged 1 commit intomainfrom
builder/air-556-update-readme-and-docs-for-hn-

waleedkadous commented Feb 25, 2026

Uh oh!

waleedkadous commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waleedkadous commented Feb 25, 2026

Summary

Linked docs reviewed (no changes needed)

Key decisions

Test plan

Uh oh!

waleedkadous commented Feb 25, 2026

Architect Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant