fix(eval): ground LLM judge with command reference to prevent false negatives #3277
ci.yml
on: pull_request
Annotations
3 errors and 1 warning
|
Eval SKILL.md
Process completed with exit code 1.
|
|
CI Status
Process completed with exit code 1.
|
|
CI Status
CI failed
|
|
Unit Tests
Patch coverage defaulted to 100% because no changed files matched coverage data.
Unmatched diff files: script/eval-skill.ts, test/skill-eval/helpers/judge.ts
Sample coverage paths: src/app.ts, src/cli.ts, src/commands/api.ts
This usually indicates a path format mismatch between your coverage tool and the repository.
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
codecov-coverage-results-fix-eval-skill-judge-context-test-unit
|
169 KB |
sha256:4e3d31526d645704427ada415404059890b2e05457b5b166f6b76dac77900781
|
|
|
codecov-test-results-fix-eval-skill-judge-context-test-unit
|
227 Bytes |
sha256:c13be0d750d2b5b875eb8a1f0a8a0ebd87fbacb937d02364d4d0cf8c3e1f7d0f
|
|
|
gh-pages
|
1.79 MB |
sha256:a9baa85c7c8f010af0513bbfb9398577d15a8d795060525a384e59d174a5c145
|
|
|
npm-package
|
892 KB |
sha256:1c7ba63ebcbac8067a5edc12bdb21185ad32d0e69463b55c45977391f56679fe
|
|
|
sentry-linux-x64
|
30.1 MB |
sha256:993eb70205d23f465b3d2bcaf32207db39cb0fddd4bf0b71c6d6297c00a3b8ad
|
|
|
skill-eval-results
|
5.93 KB |
sha256:a4d307be6ce05d1ff523d5e81d37b90d85cc678769886789fd0fa8c1c823ac71
|
|