Skip to content

fix(eval): ground LLM judge with command reference to prevent false negatives #3280

fix(eval): ground LLM judge with command reference to prevent false negatives

fix(eval): ground LLM judge with command reference to prevent false negatives #3280

Triggered via pull request April 10, 2026 10:28
Status Success
Total duration 4m 51s
Artifacts 6

ci.yml

on: pull_request
Detect Changes
7s
Detect Changes
Lint & Typecheck
29s
Lint & Typecheck
Validate generated files
15s
Validate generated files
Eval SKILL.md
2m 3s
Eval SKILL.md
Matrix: build-binary
Matrix: build-npm
Generate Delta Patches
0s
Generate Delta Patches
E2E Tests
56s
E2E Tests
Build Docs
25s
Build Docs
Publish Nightly to GHCR
0s
Publish Nightly to GHCR
CI Status
3s
CI Status
Fit to window
Zoom out
Zoom in

Annotations

1 warning
Unit Tests
Patch coverage defaulted to 100% because no changed files matched coverage data. Unmatched diff files: script/eval-skill.ts, test/skill-eval/helpers/judge.ts Sample coverage paths: src/app.ts, src/cli.ts, src/commands/api.ts This usually indicates a path format mismatch between your coverage tool and the repository.

Artifacts

Produced during runtime
Name Size Digest
codecov-coverage-results-fix-eval-skill-judge-context-test-unit
169 KB
sha256:6716edc6dd8a975d0ea3193073182c61c34bdf64159338520ffbf681444373da
codecov-test-results-fix-eval-skill-judge-context-test-unit
227 Bytes
sha256:e760e0da93a004a291fa7e7eb0f47a92a77af96d1771bc69761d162856fc75ee
gh-pages
1.79 MB
sha256:ac413bba3eb0f69cfaeb0a22dc0611f46e6deead5a40775ea631c4867a685cd3
npm-package
893 KB
sha256:fb1fdf423b726e736d746e361ce267fe8647f48d0404ed392fb72184c755ce74
sentry-linux-x64
30.1 MB
sha256:82a8971b1f637196aeb311d76eff9c88fd6cd8ed0d1987851fe9b3d19a1ba204
skill-eval-results
5.73 KB
sha256:4896867dcc897ab3333cff99c7ad2c41f39a134e7327f7b6fd1a8e16dd840222