Skip to content

Update README and docs for HN launch#557

Merged
waleedkadous merged 1 commit intomainfrom
builder/air-556-update-readme-and-docs-for-hn-
Feb 25, 2026
Merged

Update README and docs for HN launch#557
waleedkadous merged 1 commit intomainfrom
builder/air-556-update-readme-and-docs-for-hn-

Conversation

@waleedkadous
Copy link
Contributor

Closes #556

Summary

  • Hero blockquote: Replaced R1 comparison scores (92-95 vs 12-15 out of 100) with real-world production data (106 PRs in 14 days, 57min median feature) and R4 controlled comparison results
  • VIBE vs SPIR comparison: Updated the expandable analysis from R1 data (where VIBE scored 0% functionality) to R4 data (where both approaches produced working code — methodologically stronger). Shows consistent +1.2 quality advantage across 4 rounds on a 1-10 scale scored by 3 independent AI agents
  • Eating Our Own Dog Food: Added production metrics table from the Feb 2026 development analysis (106 PRs, 801 commits, 85% autonomous builders, 20 pre-merge bugs caught, $1.59 consultation cost/PR)
  • VIBE description: Fixed inaccurate claim of "working code" — R1 VIBE produced boilerplate only with 0% functionality
  • Table of contents: Fixed duplicate Quick Start entry, added Real-World Performance link

Linked docs reviewed (no changes needed)

All 5 linked docs were reviewed for currency:

  • docs/faq.md — current, accurate comparison of Codev concepts vs Claude Code concepts
  • docs/tips.md — current, all CLI examples and tips are accurate
  • docs/why.md — historical article about Codev's origin; model references (Opus 4.1) are accurate for when the experiment was run
  • codev/resources/cheatsheet.md — current, protocol/role/tool tables accurate
  • codev/resources/commands/overview.md — current, all command summaries accurate

Key decisions

  1. R4 over R1 for primary comparison: R1 was dramatic (VIBE scored 0-15 on everything) but R4 is more credible for an HN audience since both approaches produced working code. R4's +1.2 delta is more defensible than R1's catastrophic VIBE failure.

  2. Production data in hero: The development analysis numbers (106 PRs, 57min median) are more impressive to a technical audience than controlled experiment scores. Led with these.

  3. Kept linked docs unchanged: All linked docs are current. docs/why.md is a historical article — updating model names would misrepresent the original experiment.

Test plan

  • Verify all internal links resolve (TOC anchors, doc links, resource file links)
  • Verify external links to GitHub repos are accessible
  • Confirm the expandable comparison section renders correctly on GitHub
  • Review production metrics table against source data in codev/resources/development-analysis-2026-02-17.md

- Replace R1 comparison data (92-95 vs 12-15) with R4 results (7.0 vs 5.8
  on 1-10 scale, consistent +1.2 across 4 rounds)
- Add production metrics table from Feb 2026 development analysis: 106 PRs
  in 14 days, 57min median, 85% autonomous, 20 bugs caught
- Fix inaccurate VIBE description (claimed "working code" when R1 produced
  0% functionality)
- Fix duplicate Quick Start entry in table of contents
- Link hero blockquote to both case study and production data sections
@waleedkadous
Copy link
Contributor Author

Architect Review

Low risk — README only, +43/-35. Clean update:

  • Results banner now leads with real production data (106 PRs/14 days) instead of R1-only scores
  • R4 comparison table is much more credible than the old 0-vs-100 scoring
  • Production metrics section with hard numbers is compelling for HN
  • Good trim of filler content

Approved.


Architect review

@waleedkadous waleedkadous merged commit b228539 into main Feb 25, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update README and docs for HN launch

1 participant