fix(eval): ground LLM judge with command reference to prevent false negatives #3280
ci.yml
on: pull_request
Annotations
1 warning
|
Unit Tests
Patch coverage defaulted to 100% because no changed files matched coverage data.
Unmatched diff files: script/eval-skill.ts, test/skill-eval/helpers/judge.ts
Sample coverage paths: src/app.ts, src/cli.ts, src/commands/api.ts
This usually indicates a path format mismatch between your coverage tool and the repository.
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
codecov-coverage-results-fix-eval-skill-judge-context-test-unit
|
169 KB |
sha256:6716edc6dd8a975d0ea3193073182c61c34bdf64159338520ffbf681444373da
|
|
|
codecov-test-results-fix-eval-skill-judge-context-test-unit
|
227 Bytes |
sha256:e760e0da93a004a291fa7e7eb0f47a92a77af96d1771bc69761d162856fc75ee
|
|
|
gh-pages
|
1.79 MB |
sha256:ac413bba3eb0f69cfaeb0a22dc0611f46e6deead5a40775ea631c4867a685cd3
|
|
|
npm-package
|
893 KB |
sha256:fb1fdf423b726e736d746e361ce267fe8647f48d0404ed392fb72184c755ce74
|
|
|
sentry-linux-x64
|
30.1 MB |
sha256:82a8971b1f637196aeb311d76eff9c88fd6cd8ed0d1987851fe9b3d19a1ba204
|
|
|
skill-eval-results
|
5.73 KB |
sha256:4896867dcc897ab3333cff99c7ad2c41f39a134e7327f7b6fd1a8e16dd840222
|
|