Evaluation results for each skill in the collection.
| Skill | Prompts | Last run | Score | Status |
|---|---|---|---|---|
| openalex | 8 | 2026-02-14 | 78/100 | 🟡 |
- Copy
_template/into a new folder named after the skill:evals/<skill-name>/ - Fill
prompts.csv— write 8–10 prompts covering explicit, implicit, contextual invocations and at least 2 negative controls - Fill
checks.md— list the deterministic checks (commands run, files created, output format) - Adjust
rubric.schema.json— add or remove check IDs to match what matters for this skill - Run at least 2–3 prompt cases manually and save results in
results/YYYY-MM-DD.json - Update the dashboard table above
No code required for steps 1–4. Domain knowledge matters more than technical skills.
Each file in results/ follows the rubric schema:
{
"overall_pass": true,
"score": 85,
"checks": [
{ "id": "trigger", "pass": true, "notes": "Skill activated on all expected prompts." },
{ "id": "workflow", "pass": true, "notes": "All 6 steps followed." },
{ "id": "pitfalls", "pass": false, "notes": "Once sorted by relevance_score without search." }
]
}