Skip to content

fix(eval): ground LLM judge with command reference to prevent false negatives #3277

fix(eval): ground LLM judge with command reference to prevent false negatives

fix(eval): ground LLM judge with command reference to prevent false negatives #3277

Triggered via pull request April 10, 2026 10:21
Status Failure
Total duration 4m 40s
Artifacts 6

ci.yml

on: pull_request
Detect Changes
5s
Detect Changes
Lint & Typecheck
25s
Lint & Typecheck
Validate generated files
16s
Validate generated files
Eval SKILL.md
2m 4s
Eval SKILL.md
Matrix: build-binary
Matrix: build-npm
Generate Delta Patches
0s
Generate Delta Patches
E2E Tests
49s
E2E Tests
Build Docs
23s
Build Docs
Publish Nightly to GHCR
0s
Publish Nightly to GHCR
CI Status
4s
CI Status
Fit to window
Zoom out
Zoom in

Annotations

3 errors and 1 warning
Eval SKILL.md
Process completed with exit code 1.
CI Status
Process completed with exit code 1.
CI Status
CI failed
Unit Tests
Patch coverage defaulted to 100% because no changed files matched coverage data. Unmatched diff files: script/eval-skill.ts, test/skill-eval/helpers/judge.ts Sample coverage paths: src/app.ts, src/cli.ts, src/commands/api.ts This usually indicates a path format mismatch between your coverage tool and the repository.

Artifacts

Produced during runtime
Name Size Digest
codecov-coverage-results-fix-eval-skill-judge-context-test-unit
169 KB
sha256:4e3d31526d645704427ada415404059890b2e05457b5b166f6b76dac77900781
codecov-test-results-fix-eval-skill-judge-context-test-unit
227 Bytes
sha256:c13be0d750d2b5b875eb8a1f0a8a0ebd87fbacb937d02364d4d0cf8c3e1f7d0f
gh-pages
1.79 MB
sha256:a9baa85c7c8f010af0513bbfb9398577d15a8d795060525a384e59d174a5c145
npm-package
892 KB
sha256:1c7ba63ebcbac8067a5edc12bdb21185ad32d0e69463b55c45977391f56679fe
sentry-linux-x64
30.1 MB
sha256:993eb70205d23f465b3d2bcaf32207db39cb0fddd4bf0b71c6d6297c00a3b8ad
skill-eval-results
5.93 KB
sha256:a4d307be6ce05d1ff523d5e81d37b90d85cc678769886789fd0fa8c1c823ac71