Skip to content

[FEATURE] Interface improvements to execute simple agent invocation #135

@poshinchen

Description

@poshinchen

Problem Statement

Simplify the evaluation execution interface to reduce boilerplate when evaluating agents, making it easier to trigger agent calls without always requiring a custom task function wrapper.

Current State

Today, running evaluations with experiment.run_evaluations() requires users to define a task function that:

  • Takes a Case object as input
  • Manually instantiates and configures agents
  • Handles telemetry setup and span collection for trace-based evaluators
  • Maps spans to sessions using mappers
  • Returns either raw output or a dictionary with output, trajectory, and interactions

This pattern is repetitive across examples and creates friction for users who just want to evaluate an agent quickly.

Proposed Solution

Provide convenience methods that allow users to pass agents or agent factories directly to run_evaluations, with automatic handling of common patterns like:

  • Telemetry setup and span collection
  • Session mapping for trace-based evaluation
  • Output formatting
  • Tool trajectory extraction

Use Case

  • Users can evaluate simple agents in 3-5 lines of code instead of 15-20
  • All existing examples continue to work without modification

Alternatives Solutions

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions