Skip to content

fix(eval): ground LLM judge with command reference to prevent false negatives #3284

fix(eval): ground LLM judge with command reference to prevent false negatives

fix(eval): ground LLM judge with command reference to prevent false negatives #3284

Triggered via pull request April 10, 2026 11:30
Status Success
Total duration 6m 33s
Artifacts 5

ci.yml

on: pull_request
Detect Changes
4s
Detect Changes
Lint & Typecheck
30s
Lint & Typecheck
Validate generated files
10s
Validate generated files
Matrix: build-binary
Matrix: build-npm
Generate Delta Patches
0s
Generate Delta Patches
Build Docs
21s
Build Docs
Publish Nightly to GHCR
Publish Nightly to GHCR
CI Status
5s
CI Status
Fit to window
Zoom out
Zoom in

Annotations

1 warning
Unit Tests
Patch coverage defaulted to 100% because no changed files matched coverage data. Unmatched diff files: .github/workflows/ci.yml, .github/workflows/eval-skill-fork.yml, AGENTS.md, script/eval-skill.ts, test/e2e/skill-eval.test.ts, test/skill-eval/helpers/judge.ts, test/skill-eval/helpers/verify.ts Sample coverage paths: src/app.ts, src/cli.ts, src/commands/api.ts This usually indicates a path format mismatch between your coverage tool and the repository.

Artifacts

Produced during runtime
Name Size Digest
codecov-coverage-results-fix-eval-skill-judge-context-test-unit
169 KB
sha256:cfbb478af75a3b66aff7d69bccfcb87ba0704144987ce1ab2c5b0850385f7983
codecov-test-results-fix-eval-skill-judge-context-test-unit
227 Bytes
sha256:e4f0731f66ca0b7832b47206c64fba646899e25c1853da04d144824fef5f018b
gh-pages
1.79 MB
sha256:0440bdd878467fd3b4735690e0e3ab2f4ab7f8e17facc6b6f27c0d0b89ed0eff
npm-package
893 KB
sha256:c1ba028581da678f6fd713984e3911c7332bbb1c82ced1a3b54b47cc38c95f3f
sentry-linux-x64
30.1 MB
sha256:80adcce20ba40d248f779ae15152e4332f61d0d4f31ea16c90e197a83d4eb16f