Add JS-first coverage reporting toolchain#523
Add JS-first coverage reporting toolchain#523charliecreates[bot] wants to merge 5 commits intomainfrom
Conversation
Coverage summary (report-only)
Coverage inputs
Missing coverage summaries
|
|
orrery preview: https://pr-523.orrery-c4f.pages.dev/ |
There was a problem hiding this comment.
The coverage toolchain wiring is generally coherent, but there are a few correctness/operational risks: the PR workflow’s event/permission model is not ideal for forks, artifact upload currently warns instead of failing when expected reports are missing, and the new guard/aggregator scripts rely on heuristics/hard-coded lists that are likely to drift over time. Tightening these areas will make the coverage reporting more reliable and lower-maintenance.
Additional notes (1)
- Maintainability |
packages/backend-fake/package.json:19-24
Thetest:coveragescripts runpnpm run --if-present pretestand thenvitest run --coverage ....
This is redundant in packages where test already implicitly triggers pretest (npm/pnpm lifecycle hooks) and also potentially inconsistent:
pnpm run testwill automatically runpretest.pnpm run test:coveragealso runspretestexplicitly, but only because you manually invoked it.
That can lead to drift (e.g., if a package relies on other lifecycle hooks like posttest, or if pretest is expensive and you’d rather rely on standard behavior).
Summary of changes
What changed
CI / GitHub Actions
- Added a new PR workflow:
.github/workflows/coverage-pr.ymlto:- run
pnpm coverage:jsand thenpnpm coverage:report - post/replace a sticky PR comment containing
coverage/coverage-report.md - upload coverage artifacts (
coverage-report.{json,md}+packages/*/coverage/coverage-summary.json)
- run
Repo scripts & docs
- Added
scripts/check-test-coverage-scripts.mjsand wired it into the rootcheck:jspipeline viacheck:coverage-scripts. - Added
scripts/coverage/aggregate.mjsto aggregate per-packagecoverage-summary.jsonfiles into:coverage/coverage-report.jsoncoverage/coverage-report.md
- Updated
CONTRIBUTING.mdandscripts/README.mdwith coverage lane usage and script descriptions.
Package / Turbo configuration
- Added package-level
test:coveragescripts across several packages to run Vitest with V8 coverage and emitcoverage/coverage-summary.json. - Normalized Vitest to
^3.2.4and added@vitest/coverage-v8where needed. - Updated
turbo.jsonto include atest:coveragepipeline withcoverage/**outputs.
Ignore rules
- Updated
.gitignoreto ignore/coverage,apps/*/coverage, andpackages/*/coverage.
|
@CharlieHelps thank you for this strong start to the PR. I have a few items to address. First is that we do not seem to have coverage numbers for Secondly, the parity/non-parity lens was underspecified on my part and has turned out incorrect. Let me flesh this out further. Because the purpose of parity checking is to validate the parity of Although parity testing is implemented with unit testing, it's a fundamentally different testing strategy than "classical" unit testing, though can very clearly map to similar coverage metrics. The ultimate goal is to have high coverage independently in both "classical" testing and parity testing, so we need to track them separately. Of course, it is also useful to just have the general "I ran all the unit tests, here's what got covered view" for a dead-simple sense of coverage - not only for the critical CSPICE wrapping portions, but everything else (including the parity testing machinery and the fake, which itself serves an important role in our test infrastructure). Finally, within the context of what has already been implemented and is still relevant given my feedback, please address the review items mentioned here: #523 (review) |
|
Thanks for the detailed clarification — super helpful. I pushed an update here: e7bae3c
If you want, I can do one additional pass to make |
|
@CharlieHelps I need your help preparing this PR to merge. You should read the diff, related issues/docs, and PR reviews closely and then ensure ALL of the following is true:
It's your job to push commits to the PR to complete all of these tasks. You MUST wait to confirm that ALL of the above is true before stopping. I expect to be able to merge the PR after you are done. Additionally, please ensure that the re-calculated coverage numbers are posted |
37f1932 to
f48b6d5
Compare
|
Recalculated coverage for current head
Source:
|
|
Merge-readiness status:
Remaining blocker: repository policy requires an approving review ( |
@CharlieHelps thanks. why is it in this run that parity checking tests do not emit coverage summaries for those packages? if implemented correctly, it should definitely be exercising both packages. I'm wondering if it's related to how we instantiate a concrete spice client, as there is some dynamic import magic that might be confusing the coverage provider. Can you look into this? That could in theory be impacting our other coverage numbers as well. |
|
I dug into this and reproduced it locally on This
So parity tests can still be exercising both backends, but this reporting flow doesn’t attribute that cross-package execution to backend package summaries. That’s why the metric is On the dynamic import concern specifically: I don’t see evidence that it’s skewing these numbers. The parity artifacts omit both backends, and one backend path ( The broader caveat is real, though: these are per-package coverage summaries, not a merged system-level transitive coverage view. So cross-package execution generally won’t be credited unless that package’s own tests are run under coverage. If you want, I can make a small follow-up tweak to the |
@CharlieHelps I think the issue is that in your linux-arm64 devbox, the parity tests don't execute because there is no native cspice available. We'd probably get different results for it on the github runners. That said, trying to get this working without your ability to run the tests is probably not worth our time when we have linux-arm64 support on the horizon. Let's just scope this down to the vanilla-standard coverage definition that we already support; run all unit tests, report all coverage numbers. Command for doing so that you can run will produce numbers, runners on github which run the parity tests will produce different numbers - that's fine. |
|
Thanks — I scoped this PR down to the vanilla unit-test coverage reporting path as requested. Updated in this push:
Lightweight validation run:
I did not re-run the full parity-sensitive local coverage flow in this arm64 devbox. Per the requested scope, differences between local numbers and GitHub runner numbers due to parity execution are acceptable. |
|
@CharlieHelps please review all relevant issue and PR context, then conduct a thorough code review. What “good” looks like
Workflow
Use this exact structure in the review summary: Blocking feedback
**Non-blocking feedback**
Rules:
|
There was a problem hiding this comment.
Blocking feedback
scripts/workspace-packages.mjsfallback discovery can devolve into an expensive full-repo walk, which is risky given it runs undercheck:js(performance/CI time).- The hand-rolled
pnpm-workspace.yamlparser is brittle and can cause silent package discovery drift (correctness/maintainability).
**Non-blocking feedback**
check-test-coverage-scripts.mjsmay over-detect Vitest usage by treating any@vitest/*dependency as a Vitest-runner signal (policy false positives).aggregateLens()should be made defensive or have its “observed entries only” contract made explicit to prevent fragile future refactors.- Consider aligning the coverage workflow Node version with the repo’s canonical CI/runtime baseline to avoid runtime skew.
Additional notes (1)
- Maintainability |
scripts/check-test-coverage-scripts.mjs:9-18
hasVitestDependency()treats any@vitest/*dependency as indicating a Vitest test runner, but packages can legitimately depend on@vitest/*utilities without using Vitest as their test command (or could use a different runner).
This can create a policy false-positive where check:coverage-scripts forces a test:coverage script even though vitest isn’t the runner for that package.
Summary of changes
Overview
This PR introduces a JS-first, report-only coverage toolchain across the workspace.
CI / GitHub Actions
- Adds a PR workflow to run coverage and upload artifacts:
.github/workflows/coverage-pr.yml- Runs
pnpm coverageandpnpm coverage:report - Builds
coverage/coverage-comment.md - Uploads aggregate + per-package
coverage-summary.jsonas an artifact (fails if missing)
- Runs
- Adds a trusted publisher workflow:
.github/workflows/coverage-pr-comment.yml- Triggers on
workflow_runof the Coverage PR workflow - Downloads the artifact and posts/updates a sticky PR comment
- Triggers on
Coverage tooling
- Adds a shared Vitest coverage runner:
scripts/run-vitest-coverage.mjs - Adds workspace-aware package discovery utilities:
scripts/workspace-packages.mjs - Adds a guardrail to enforce
test:coveragepresence for Vitest packages (with a small exclusion set):scripts/check-test-coverage-scripts.mjs - Adds an aggregator that produces JSON + Markdown reports from per-package
coverage-summary.json:scripts/coverage/aggregate.mjs
Repo/package wiring
- Root
package.jsongainscoverage,coverage:js, andcoverage:reportscripts;check:jsnow runscheck:coverage-scripts. - Multiple packages add
test:coveragedelegating to the shared runner and bump tovitest@^3.2.4(plus@vitest/coverage-v8where coverage is enabled). turbo.jsonadds atest:coveragepipeline withcoverage/**outputs..gitignorenow ignores/coverageand per-app/packagecoveragedirectories.- Docs updated (
CONTRIBUTING.md,scripts/README.md) to describe the coverage lane and outputs.
|
@CharlieHelps I need your help preparing this PR to merge. You should read the diff, related issues/docs, and PR reviews closely and then ensure ALL of the following is true:
It's your job to push commits to the PR to complete all of these tasks. You MUST wait to confirm that ALL of the above is true before stopping. I expect to be able to merge the PR after you are done. |
|
Final merge-readiness checkpoint for
If a maintainer can leave an approval, this should be ready to merge. |
|
@CharlieHelps please review all relevant issue and PR context, then conduct a thorough code review. What “good” looks like
Workflow
Use this exact structure in the review summary: Blocking feedback
**Non-blocking feedback**
Rules:
|
There was a problem hiding this comment.
Blocking feedback
coverage:jsis documented as matchingtest:jssemantics, but it doesn’t; this will mislead contributors and cause lane drift.- Root
coverage/coverage:jsscripts hard-code filters, creating a second source of truth that can drift fromtest:coveragepresence and the aggregator’s discovered targets.
**Non-blocking feedback**
- Harden the trusted
workflow_runcomment publisher with an explicithead_repository.full_name == github.repositoryguard to minimize surprising write contexts. run-vitest-coverage.mjsexplicitly runspretest; consider relying on lifecycle semantics (or at least improve failure diagnostics when processes terminate by signal).- The Vitest dependency–based guardrail scope may create false positives; consider an explicit opt-in flag shared by both guardrail and aggregator.
- Consider an opt-in verification mode to ensure
fs.globand fallback discovery yield identical manifest lists across Node versions.
Additional notes (3)
- Maintainability |
scripts/run-vitest-coverage.mjs:1-41
scripts/run-vitest-coverage.mjsrunspnpm run --if-present pretestand thenpnpm exec vitest .... For packages wheretest(ortest:coverage) already relies on the standardpretestlifecycle semantics, this can cause duplication and subtle drift (e.g., future addition ofposttest, orpretestdoing heavier work than intended for coverage runs).
Also, using spawnSync for both commands is fine, but the current helper returns 1 on non-numeric status even when result.signal is set; that can lose debugging signal.
- Compatibility |
scripts/workspace-packages.mjs:243-272
Inworkspace-packages.mjs, the Nodefs.globcall uses{ cwd, exclude: excludeManifestPatterns }. Node’sfs.globAPI usesexcludeas an array of glob patterns, but behavior differs across Node versions and the option surface is still relatively new.
Given this script is part of check:js, a subtle incompatibility here would be painful. You already implemented a fallback walker, but the two code paths may not match exactly (especially around ! negation semantics and pattern normalization).
- Readability |
scripts/coverage/aggregate.mjs:66-75
InfinalizeSummaryTotals(), you start fromemptySummaryTotals()and then calladdMetric(totals[metricName], rawTotals?.[metricName])before finalizing. This works, but it’s an unnecessary indirection and makes the function read like it’s aggregating multiple entries when it’s really just normalizing a single package summary.
This matters because this file is now part of the “tooling surface area” that will get changed repeatedly; simpler code reduces future mistakes.
Summary of changes
Summary of changes
CI: PR coverage generation + trusted comment publisher
- Added a PR workflow to run coverage + aggregate a report and upload artifacts:
.github/workflows/coverage-pr.yml- Runs
pnpm coveragethenpnpm coverage:report - Builds
coverage/coverage-comment.mdwith a stable marker<!-- coverage-pr-summary --> - Uploads
coverage-report.{json,md}, the comment body, and per-workspacecoverage-summary.jsonfiles (withif-no-files-found: error).
- Runs
- Added a trusted workflow to publish/update a sticky PR comment from uploaded artifacts:
.github/workflows/coverage-pr-comment.yml- Triggers on
workflow_runcompletion of the coverage workflow - Downloads the artifact and uses
peter-evans/*to find/update the comment.
- Triggers on
Repo wiring
- Updated root scripts in
package.jsonto add:check:coverage-scripts(guardrail)coverage,coverage:js, andcoverage:report
- Added
test:coveragescripts across multiple packages delegating to a shared runner:node ../../scripts/run-vitest-coverage.mjs. - Updated Turbo pipeline in
turbo.jsonto addtest:coveragewithcoverage/**outputs.
Tooling scripts
- Added workspace discovery utilities:
scripts/workspace-packages.mjs(parsespnpm-workspace.yamlusingyaml, with a fallback walker). - Added a coverage-script guardrail:
scripts/check-test-coverage-scripts.mjs(enforcestest:coveragefor packages depending onvitest, with an exclusion list). - Added a coverage runner:
scripts/run-vitest-coverage.mjs(runs optionalpretestthen standardizedvitest --coverageflags viapnpm exec). - Added a coverage aggregator:
scripts/coverage/aggregate.mjs- Discovers targets from workspace manifests that declare
test:coverage - Reads
coverage/coverage-summary.jsonper target - Fails on missing summaries unless
COVERAGE_ALLOW_MISSING=1 - Emits JSON + Markdown report with
n/afor no-data metrics.
- Discovers targets from workspace manifests that declare
Docs / ignore rules
- Updated
.gitignoreto ignorecoverage/and per-workspacecoveragedirectories. - Added coverage lane documentation to
CONTRIBUTING.mdand script index updates toscripts/README.md.
Dependency normalization
- Normalized Vitest to
^3.2.4across packages and added@vitest/coverage-v8where coverage is enabled. - Added root devDependency
yamlfor workspace parsing.
| ### Coverage reporting (report-only) | ||
|
|
||
| The repository includes a report-only coverage lane for pull requests. | ||
|
|
||
| From the repo root: | ||
|
|
||
| - `pnpm coverage` — run all coverage-enabled package tests via Turbo | ||
| - `pnpm coverage:js` — JS-only coverage lane (matches `test:js` semantics) | ||
| - `pnpm coverage:report` — aggregate package summaries into JSON + markdown | ||
|
|
||
| The aggregate report emits a single all-unit-tests summary using whatever package `coverage-summary.json` files are present (plus package-level and missing-summary details). | ||
|
|
||
| `pnpm coverage:js` still runs a JS-only execution lane (matching `test:js` semantics), but `pnpm coverage:report` no longer adds parity/non-parity-specific views. | ||
|
|
There was a problem hiding this comment.
coverage:js is documented in CONTRIBUTING.md as “matches test:js semantics”, but the script currently does not match: test:js excludes @rybosome/tspice-backend-node, while coverage:js excludes backend-node and also changes inclusion by explicitly listing packages.
This mismatch is a correctness/operational issue because contributors will trust docs and end up comparing apples-to-oranges between test:js and coverage:js (and your PR automation uses pnpm coverage today, but the lane is still part of the public interface and will drift).
Suggestion
Update the docs and/or the script so they truly align.
Smallest fix: change the doc line to describe what coverage:js actually does (a curated JS-only list) rather than claiming parity with test:js.
Better fix: define coverage:js in terms of the same filter semantics as test:js so it stays aligned automatically, e.g.:
"coverage:js": "turbo run test:coverage --filter=!@rybosome/tspice-backend-node"
If you want, reply with "@CharlieHelps yes please" and I’ll add a commit that makes coverage:js match test:js semantics and adjusts the docs accordingly.
| "test": "turbo run test", | ||
| "test:js": "turbo run test --filter=!@rybosome/tspice-backend-node", | ||
| "coverage": "turbo run test:coverage --filter=@rybosome/tspice-backend-contract --filter=@rybosome/tspice-backend-fake --filter=@rybosome/tspice-backend-node --filter=@rybosome/tspice-backend-wasm --filter=@rybosome/tspice-core --filter=@rybosome/tspice-parity-checking --filter=@rybosome/tspice", | ||
| "coverage:js": "turbo run test:coverage --filter=@rybosome/tspice-backend-contract --filter=@rybosome/tspice-backend-fake --filter=@rybosome/tspice-backend-wasm --filter=@rybosome/tspice-core --filter=@rybosome/tspice-parity-checking --filter=@rybosome/tspice", | ||
| "coverage:report": "node scripts/coverage/aggregate.mjs", |
There was a problem hiding this comment.
The coverage / coverage:js scripts hard-code a list of --filter=@rybosome/... entries. This is a second source of truth that can drift from:
- which packages actually have
test:coverage - what the aggregator discovers (it discovers by
test:coverage)
Net effect: CI can run coverage for a set of packages that doesn’t match what aggregate.mjs expects (or vice versa), and adding a new coverage-enabled package requires touching multiple files.
Suggestion
Make the execution lane derive from the same “source of truth” as aggregation.
Two minimal approaches:
- Switch to negative filters (like
test:js) and lettest:coveragepresence control scope via Turbo itself. - Add a script that prints discovered coverage targets (from
listWorkspacePackageManifests()+test:coverage) and feed that to Turbo.
Reply with "@CharlieHelps yes please" if you’d like me to add a commit that simplifies coverage/coverage:js to avoid hard-coded filter lists and reduce drift.
| - name: Resolve PR number from workflow_run payload | ||
| id: pr | ||
| run: | | ||
| pr_number="$(jq -r '.workflow_run.pull_requests[0].number // empty' "$GITHUB_EVENT_PATH")" | ||
|
|
||
| if [ -z "$pr_number" ]; then | ||
| echo "No pull request found in workflow_run payload; skipping." | ||
| echo "number=" >> "$GITHUB_OUTPUT" | ||
| exit 0 | ||
| fi | ||
|
|
||
| echo "number=$pr_number" >> "$GITHUB_OUTPUT" | ||
|
|
||
| - name: Download coverage artifact | ||
| if: ${{ steps.pr.outputs.number != '' }} | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: coverage-pr-${{ steps.pr.outputs.number }} | ||
| run-id: ${{ github.event.workflow_run.id }} | ||
| github-token: ${{ secrets.GITHUB_TOKEN }} | ||
| path: coverage-artifact | ||
|
|
There was a problem hiding this comment.
The trusted workflow_run publisher is a good direction, but it still implicitly trusts that the coverage-pr-${{ steps.pr.outputs.number }} artifact exists for run-id and that the job has rights to download it. When coverage runs are re-run, or if GitHub changes workflow_run.pull_requests ordering, this can fail noisily.
Right now, a missing artifact will cause a hard failure (via actions/download-artifact), which is likely fine for correctness—but operationally it can create spurious red Xs on PRs.
Consider making the download step explicitly tolerate “artifact not found” and skip publishing rather than failing the entire workflow (while still failing for unexpected file layout once the artifact is present).
Suggestion
You can make this more robust by turning the artifact download into a guarded step that exits cleanly if the artifact is absent.
One approach:
- Add
continue-on-error: trueon the download step. - Then gate subsequent steps on
steps.download.outcome == 'success'.
Example:
- name: Download coverage artifact
id: download
if: ${{ steps.pr.outputs.number != '' }}
continue-on-error: true
uses: actions/download-artifact@v4
with:
name: coverage-pr-${{ steps.pr.outputs.number }}
run-id: ${{ github.event.workflow_run.id }}
github-token: ${{ secrets.GITHUB_TOKEN }}
path: coverage-artifact
- name: Resolve coverage comment body path
if: ${{ steps.download.outcome == 'success' }}
...Reply with "@CharlieHelps yes please" if you'd like me to add a commit implementing this hardening.
| const vitestArgs = [ | ||
| "exec", | ||
| "vitest", | ||
| "run", | ||
| "--coverage", | ||
| "--coverage.provider=v8", | ||
| "--coverage.reporter=text-summary", | ||
| "--coverage.reporter=json-summary", | ||
| "--coverage.reporter=lcov", | ||
| "--coverage.reportsDirectory=coverage", | ||
| ...process.argv.slice(2), | ||
| ]; | ||
|
|
||
| const vitestExitCode = run(PNPM_CMD, vitestArgs); | ||
| process.exit(vitestExitCode); |
There was a problem hiding this comment.
scripts/run-vitest-coverage.mjs invokes Vitest via pnpm exec vitest .... That’s fine, but it makes it very easy for callers to accidentally pass flags that change output locations or reporters, which would break the aggregator and the workflow artifact expectations.
Given this is now “shared infra”, it’s worth making the contract explicit: either strictly control the reporters/output dir, or validate after the run that coverage/coverage-summary.json exists and fail loudly if not.
Suggestion
Add a post-run verification step in this runner to assert the expected summary exists in the current package CWD.
Example:
import fs from 'node:fs'
import path from 'node:path'
const summaryPath = path.resolve('coverage/coverage-summary.json')
if (vitestExitCode === 0 && !fs.existsSync(summaryPath)) {
console.error(`Expected coverage summary not found at ${summaryPath}`)
process.exit(1)
}Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this guard.
| // Intentionally out of initial coverage-script scope for issue #522. | ||
| const EXCLUDED_FROM_INITIAL_COVERAGE = new Set([ | ||
| "@rybosome/orrery", | ||
| "@rybosome/tspice-bench-contract", | ||
| ]); | ||
|
|
||
| function hasVitestDependency(manifest) { | ||
| const dependencies = { | ||
| ...(manifest.dependencies ?? {}), | ||
| ...(manifest.devDependencies ?? {}), | ||
| }; | ||
|
|
||
| return typeof dependencies.vitest === "string" && dependencies.vitest.trim().length > 0; | ||
| } | ||
|
|
||
| function hasCoverageScript(scripts) { | ||
| return typeof scripts?.["test:coverage"] === "string" && scripts["test:coverage"].trim().length > 0; | ||
| } | ||
|
|
||
| async function main() { | ||
| const manifests = await listWorkspacePackageManifests(); | ||
|
|
||
| const missingCoverageScripts = []; | ||
|
|
||
| for (const { manifestPath, manifest } of manifests) { | ||
| const packageName = | ||
| typeof manifest.name === "string" && manifest.name.trim().length > 0 | ||
| ? manifest.name | ||
| : manifestPath; | ||
|
|
||
| if (EXCLUDED_FROM_INITIAL_COVERAGE.has(packageName)) { | ||
| continue; | ||
| } | ||
|
|
||
| if (!hasVitestDependency(manifest)) { | ||
| continue; | ||
| } | ||
|
|
||
| if (!hasCoverageScript(manifest.scripts)) { | ||
| missingCoverageScripts.push({ | ||
| packageName, | ||
| manifestPath, | ||
| }); | ||
| } |
There was a problem hiding this comment.
aggregate.mjs discovers coverage targets by the presence of a test:coverage script, but the guardrail (check-test-coverage-scripts.mjs) enforces test:coverage for any workspace package that depends on vitest (minus the exclusion set).
This coupling is fragile:
- a package can have
vitestinstalled for utilities or shared configs without being a coverage target - conversely, a package might run vitest via
pnpm exec vitestwithout listing it directly (rare, but possible)
The practical impact is policy churn: the repo may be forced to add test:coverage where it wasn’t intended, and the aggregator scope will change as a side-effect.
Suggestion
Make the “coverage scope” explicit and consistent.
Smallest fix: change the guardrail to only require test:coverage when a package has both:
vitestdependency, and- a
testscript that runs vitest (or avitest.config.*present),
or introduce a manifest flag (e.g. "tspice": { "coverage": true }) that the aggregator + guardrail both use.
Reply with "@CharlieHelps yes please" if you’d like me to add a commit that aligns the guardrail scope with aggregation via an explicit opt-in flag (least surprising long-term).
Summary
test:coveragescripts for requested packages, excludingapps/orreryandpackages/bench-contract.v3.2.4.Closes #522