Commit 79e72c1
refactor(eval): replace OpenAI with Anthropic SDK in init-eval judge (#683)
## Summary
Standardizes all evals on the Anthropic SDK. The skill-eval already used
`@anthropic-ai/sdk`; this switches the init-eval judge from OpenAI
(`gpt-4o`) to Anthropic (`claude-sonnet-4-6`) and drops the `openai`
dependency.
## Changes
- `test/init-eval/helpers/judge.ts`: swap OpenAI client/API for
Anthropic Messages API
- `package.json`: remove `openai` from devDependencies
- `OPENAI_API_KEY` → `ANTHROPIC_API_KEY` env var (already required by
skill-eval)
## Test plan
- [x] `bun eval:skill` passes (sonnet 100%, opus 87.5%)
- [x] `bun test:init-eval` — judge calls succeed with Anthropic (wizard
auth is a separate issue)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 584ec0e commit 79e72c1
3 files changed
+10
-13
lines changedSome generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
34 | 33 | | |
35 | 34 | | |
36 | 35 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
| 34 | + | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
49 | | - | |
| 48 | + | |
| 49 | + | |
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | | - | |
90 | | - | |
| 89 | + | |
| 90 | + | |
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
| 95 | + | |
| 96 | + | |
96 | 97 | | |
97 | 98 | | |
98 | 99 | | |
| |||
0 commit comments