Skip to content

fix(eval): ground LLM judge with command reference to prevent false negatives #3283

fix(eval): ground LLM judge with command reference to prevent false negatives

fix(eval): ground LLM judge with command reference to prevent false negatives #3283

Triggered via pull request April 10, 2026 11:20
Status Failure
Total duration 5m 7s
Artifacts 5

ci.yml

on: pull_request
Detect Changes
5s
Detect Changes
Lint & Typecheck
24s
Lint & Typecheck
Validate generated files
13s
Validate generated files
Matrix: build-binary
Matrix: build-npm
Generate Delta Patches
0s
Generate Delta Patches
Build Docs
25s
Build Docs
Publish Nightly to GHCR
Publish Nightly to GHCR
CI Status
3s
CI Status
Fit to window
Zoom out
Zoom in

Annotations

5 errors and 1 warning
E2E Tests
Process completed with exit code 1.
error: expect(received).toBeGreaterThanOrEqual(expected): test/e2e/skill-eval.test.ts#L53
Expected: >= 0.75 Received: 0 at runEvalForModel (/home/runner/work/cli/cli/test/e2e/skill-eval.test.ts:53:19)
error: expect(received).toBeGreaterThanOrEqual(expected): test/e2e/skill-eval.test.ts#L53
Expected: >= 0.75 Received: 0 at runEvalForModel (/home/runner/work/cli/cli/test/e2e/skill-eval.test.ts:53:19)
CI Status
Process completed with exit code 1.
CI Status
CI failed
Unit Tests
Patch coverage defaulted to 100% because no changed files matched coverage data. Unmatched diff files: .github/workflows/ci.yml, .github/workflows/eval-skill-fork.yml, AGENTS.md, script/eval-skill.ts, test/e2e/skill-eval.test.ts, test/skill-eval/helpers/judge.ts, test/skill-eval/helpers/verify.ts Sample coverage paths: src/app.ts, src/cli.ts, src/commands/api.ts This usually indicates a path format mismatch between your coverage tool and the repository.

Artifacts

Produced during runtime
Name Size Digest
codecov-coverage-results-fix-eval-skill-judge-context-test-unit
169 KB
sha256:2ecc97df550dd5a3d75f91352ade6c84cdd5ad233c16a5b3b0bbaddf87038226
codecov-test-results-fix-eval-skill-judge-context-test-unit
227 Bytes
sha256:dee043434cf08fc52116b8ada461b35f1224f6b9b9fa85ccfc4e377518d07795
gh-pages
1.79 MB
sha256:f44e66c4b8a8eea6d1bc8a24e20b9d2f6d8666142c2f0c46f128dec9bc6e7992
npm-package
893 KB
sha256:8d968444a4271ae7169a6d8076deb2b5ea5a718127b2397aefabf7bc3f96536f
sentry-linux-x64
30.1 MB
sha256:140ea717e003202386d21b6ee1d09766ff416ecaeac76233d31f4927aeb979e5