generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Problem Statement
Simplify the evaluation execution interface to reduce boilerplate when evaluating agents, making it easier to trigger agent calls without always requiring a custom task function wrapper.
Current State
Today, running evaluations with experiment.run_evaluations() requires users to define a task function that:
- Takes a
Caseobject as input - Manually instantiates and configures agents
- Handles telemetry setup and span collection for trace-based evaluators
- Maps spans to sessions using mappers
- Returns either raw output or a dictionary with output, trajectory, and interactions
This pattern is repetitive across examples and creates friction for users who just want to evaluate an agent quickly.
Proposed Solution
Provide convenience methods that allow users to pass agents or agent factories directly to run_evaluations, with automatic handling of common patterns like:
- Telemetry setup and span collection
- Session mapping for trace-based evaluation
- Output formatting
- Tool trajectory extraction
Use Case
- Users can evaluate simple agents in 3-5 lines of code instead of 15-20
- All existing examples continue to work without modification
Alternatives Solutions
No response
Additional Context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request