Add script and github workflow to ratchet coverage to 100%#142
Add script and github workflow to ratchet coverage to 100%#142
Conversation
d163f57 to
cc76891
Compare
Replace the weak --fail-under-lines 30 threshold with a script that parses LCOV data and fails on any genuinely uncovered production code. Allowed exclusions: structural syntax (closing braces), unreachable!(), cfg(test) code, and #[ignore]d test bodies. The script also merges coverage from TempRustProject subprocess binaries by passing them as --object to llvm-cov export. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Exclude panic!() lines and multi-line macro continuations from coverage - Add // nocov (line) and // nocov start/end (block) annotations - Add .github/coverage-ratchet.json tracking nocov line count - CI auto-removes unnecessary annotations, auto-updates ratchet on PRs - Add coverage annotation policy to .claude/CLAUDE.md - Annotate antithesis-only code and error paths with // nocov Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cc76891 to
5bfede7
Compare
When the ratchet tightens on main (e.g. after a merge removes covered code), create a PR to update the ratchet file instead of failing the build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Annotations added based on CI coverage output. The ratchet is set to 337 excluded lines. As tests are added, the coverage script will auto-remove unnecessary annotations and tighten the ratchet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add // nocov to 112 lines that CI reported as uncovered (lines where
cargo fmt moved the inline comment to a standalone line, which were
then removed since structural syntax like { is already auto-excluded).
Update ratchet to 340.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lines like function signatures, while/if/match headers, and where clauses can't have trailing comments per cargo fmt. Wrap these in // nocov start/end blocks instead. Ratchet at 453. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLVM sometimes reports coverage for lines containing only nocov block markers (start/end), particularly when cargo fmt places them near closing braces. Treat any line containing // nocov as excluded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add comment explaining zizmor ignore for persist-credentials - Simplify bot email in CI workflow - Remove RELEASE.md (no release needed) - Fix docstring wording: "specific" not "nuanced", "allowed to be" - Remove panic!() auto-exclusion; add nocov to uncovered panic lines - Replace consecutive inline nocov with start/end blocks - Clean up format_backtrace and other messy nocov patterns in runner.rs - Add check_consecutive_nocov() to fail on 3+ consecutive inline nocov - Move "don't blindly add nocov" guidance to CLAUDE.md - Simplify tool output messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These panic! calls (STOP_TEST_STRING, SERVER_CRASHED_MESSAGE, etc.) were previously auto-excluded but are now handled via explicit nocov annotations after removing the panic! auto-exclusion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wrap in nocov blocks instead of inline annotations that cargo fmt moves to standalone comment lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add scripts/check-nocov-style.py as a lint check (not coverage) - Checks: 3+ consecutive inline nocov, inline adjacent to block markers, end immediately followed by start - Add to justfile check-lint target - Fix all existing violations: expand blocks to absorb adjacent inline annotations, merge 13 adjacent end/start pairs - Remove check_consecutive_nocov from coverage script Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lint job doesn't install uv, and this script has no dependencies beyond the standard library. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Blocks with only one line between start/end should use inline // nocov instead. Added the check to check-nocov-style.py and converted 4 existing single-line blocks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lines ending with { or expression continuations can't take inline
// nocov (cargo fmt moves the comment). Only flag single-line blocks
where the content line ends with ; or , (statements that could take
inline comments).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix git add -u '*.rs' to git add -u (glob didn't match subdirs) - Add GH_TOKEN env for gh pr create step - Move nocov end markers in packet.rs to own lines - Clean up nocov blocks in packet.rs to wrap entire if blocks - Add lint rule: start/end markers must be on their own lines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add lint rule 6: inline // nocov inside a block is redundant. Removed 6 such annotations from collections.rs and value.rs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
I like the semantics here! I'm totally good with a ratchet. 1k lines of python slop, even in lint scripts, makes me sad though and I'm registering a desire to do a (light) pass over them |
Fine by me! Although I am unlikely to do more than tell Claude to fix it in this case. I do think this is an area where we should be moderately slop-tolerant, because the failure modes are so minor. (Side note: I would count it as more like 500 lines of Python. Most of that is comments, which I have given a bit more attention to and they're definitely in Claude voice but seem accurate to me) |
And the output is constrained and easily-verifiable. Agreed. |
Claude loves avoiding coverage, and it snuck in a
--fail-under-lines 30for what should have been a--fail-under-lines 100. This is partly because coverage in rust is sortof weird and broken and insisting on 100 is actually a pain in the ass for forgiveable reasons. I've brought in a custom coverage script I have from another project which parses coverage output and adds exclusion rules, and adapted it to the local context.There's a lot of missing coverage right now. I tried to get Claude to fix this with a whole bunch of new tests but unfortunately the result was in the end mostly slop, and was better thrown away than used. Sorry for how much I made you look at of it before conceding this point, Liam.
So this PR instead adopts the traditional solution to this problem: A ratchet.
We allow nocov annotations (I am not happy about this), but track the total number of lines excluded by nocov, and the build checks that we are under that amount, and fails if we're not. Humans are allowed to raise this number if necessary, but Claude has been told that it is not (and may or may not obey this. We should keep an eye out).
Ideally (according to me) we would get this number down to zero over time, but the most important thing is to keep the amount of untested code under control, even if we never get it all the way to 0.
This also adds a lint check that coverage annotations are used in a sensible way, because claude got super weird about it and put in all sorts of slightly bizarre patterns.
Reviewing notes:
The interesting parts of this is all in the logic really. I'm much more interested in whether you're happy with the behaviour than the code.
The rust code should have no changes other than a bunch of coverage annotations (I deliberately decided to not add any tests in this PR, after previous attempts to add tests went awry).
The Python scripts are janky crap. I've read them through and they fall short of slop, but not by much. I also don't care that much because they're completely non-production-facing and are just build orchestration. If you've got strong feelings about any of them, please feel free to object.