-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Goal
Build an evaluation harness for an LLM feature.
Learn
- Deterministic vs LLM-as-judge evals
- Metrics for different use cases
- Test set creation
- Regression testing for prompts
Deliverable
- Eval suite for the AI shopping assistant
- Test set with labeled examples
Proof Point
Can measure whether a prompt change helped or hurt.
Links
→ Feeds into: FasterShops AI Merchant Assistant
Directory
agent-patterns/03-evaluation-evals/
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels