3.3 Evaluation & Evals

## Goal
Build an evaluation harness for an LLM feature.

## Learn
- Deterministic vs LLM-as-judge evals
- Metrics for different use cases
- Test set creation
- Regression testing for prompts

## Deliverable
- Eval suite for the AI shopping assistant
- Test set with labeled examples

## Proof Point
Can measure whether a prompt change helped or hurt.

## Links
→ Feeds into: FasterShops AI Merchant Assistant

## Directory
`agent-patterns/03-evaluation-evals/`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.3 Evaluation & Evals #11

Goal

Learn

Deliverable

Proof Point

Links

Directory

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

3.3 Evaluation & Evals #11

Description

Goal

Learn

Deliverable

Proof Point

Links

Directory

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions