[FEATURE] Optional Case specific Goal for GoalSuccessRateEvaluator

### Problem Statement

With the current system prompt of the `GoalSucessRateEvaluator`, the evaluator agent tries to deduct the goal of an evaluation case based on the input and the trajectory. While writing evaluation cases, I realized, that the evaluator is often not able to correctly assess the correct goal of a test just by this information alone. Especially for cases where the input is on purpose ambiguous to test if the agent correctly can achieve a certain goal, the evaluator is not able to correctly assess the goal.

### Proposed Solution

Similar to the `ActorSimulator.from_case_for_user_simulator()` function which allows to extent the default prompt of the `ActorSimulator` agent with a evaluation case specific task description via the metadata field `task_description`, we should allow to extend the `GoalSucessRateEvaluator` system prompt with task specific goals via the `goal` metadata field of a evaluation case. The `GoalSucessRateEvaluator` agent should then be instructed in its system prompt to either deduct the goal from the input and trajectory data of the conversation (as it does now) or (if it exist) use the evaluation case specific goal.

If the proposed solution would be implemented, then the evaluation cases could be defined as follows:

```python
Case[str, str](
    name="verify-fix",
    input="I believe I fixed the issue ISSUE-1234. Can you start a validation to check?",
    metadata={
        "task_description": (
            "You are a developer who deployed a fix for ISSUE-1234 and want to verify it by running a validation. "
            "You only need confirmation that the validation started - you do NOT need to wait for results. "
            "Once the agent confirms the validation is running, thank them and end the conversation."
        ),
        "goal": (
            "The agent must start a validation for ISSUE-1234 and tell the user that the scan has been started. "
        ),
    },
)
```

### Use Case

Currently, the `GoalSucessRateEvaluator` is not able to properly test e.g. the following scenario:

An agent that should trigger a validation scan of some configuration based on an issue ID. If the context is not clear from the message of the user, the agent should ask for clarification before starting a scan.

```markdown
# Unambiguous input without confirmation

**User**: Re-run the validation for ISSUE-1234.
**Agent**: Starting the validation job for ISSUE-1234...

# Ambiguous input with confirmation

**User**: ISSUE-1234
**Agent**: Do you want me to re-run the validation for ISSUE-1234?
**User**: Yes!
**Agent**: Starting the validation job for ISSUE-1234...
```

In both cases the agent, has achieved the goal (starting a scan) to 100%. However, since the input in the second example is not making the goal obvious right from start to the evaluator, it evaluates it to "goal not achieved".

### Alternatives Solutions

We could override the `system_prompt` parameter of the `GoalSucessRateEvaluator` so that it considers specific goals.

However, this has the following disadvantages:
- We would need to instantiate a new `GoalSucessRateEvaluator` for every evaluation case, to add our case specific goal.
- We only want to extend the exist system prompt and not override it. Adding our specific goal to it is possible, but cumbersome.

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Optional Case specific Goal for GoalSuccessRateEvaluator #74

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Optional Case specific Goal for GoalSuccessRateEvaluator #74

Description

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions