fix(eval): ground LLM judge with command reference to prevent false negatives #3283
ci.yml
on: pull_request
Annotations
5 errors and 1 warning
|
E2E Tests
Process completed with exit code 1.
|
|
error: expect(received).toBeGreaterThanOrEqual(expected):
test/e2e/skill-eval.test.ts#L53
Expected: >= 0.75
Received: 0
at runEvalForModel (/home/runner/work/cli/cli/test/e2e/skill-eval.test.ts:53:19)
|
|
error: expect(received).toBeGreaterThanOrEqual(expected):
test/e2e/skill-eval.test.ts#L53
Expected: >= 0.75
Received: 0
at runEvalForModel (/home/runner/work/cli/cli/test/e2e/skill-eval.test.ts:53:19)
|
|
CI Status
Process completed with exit code 1.
|
|
CI Status
CI failed
|
|
Unit Tests
Patch coverage defaulted to 100% because no changed files matched coverage data.
Unmatched diff files: .github/workflows/ci.yml, .github/workflows/eval-skill-fork.yml, AGENTS.md, script/eval-skill.ts, test/e2e/skill-eval.test.ts, test/skill-eval/helpers/judge.ts, test/skill-eval/helpers/verify.ts
Sample coverage paths: src/app.ts, src/cli.ts, src/commands/api.ts
This usually indicates a path format mismatch between your coverage tool and the repository.
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
codecov-coverage-results-fix-eval-skill-judge-context-test-unit
|
169 KB |
sha256:2ecc97df550dd5a3d75f91352ade6c84cdd5ad233c16a5b3b0bbaddf87038226
|
|
|
codecov-test-results-fix-eval-skill-judge-context-test-unit
|
227 Bytes |
sha256:dee043434cf08fc52116b8ada461b35f1224f6b9b9fa85ccfc4e377518d07795
|
|
|
gh-pages
|
1.79 MB |
sha256:f44e66c4b8a8eea6d1bc8a24e20b9d2f6d8666142c2f0c46f128dec9bc6e7992
|
|
|
npm-package
|
893 KB |
sha256:8d968444a4271ae7169a6d8076deb2b5ea5a718127b2397aefabf7bc3f96536f
|
|
|
sentry-linux-x64
|
30.1 MB |
sha256:140ea717e003202386d21b6ee1d09766ff416ecaeac76233d31f4927aeb979e5
|
|