[BUG] eval functions _convert_agent_invocation_span and _process_user_message do not support multimodal input

### Checks

- [x] I have updated to the lastest minor and patch version of Strands and evals
- [x] I have checked the documentation and this is not expected behavior
- [x] I have searched [./issues](./issues?q=) and there are no duplicates of my issue

### Strands Version

0.1.5

### Strands Evals Version

1.24.0

### Python Version

3.12.3

### Operating System

Ubuntu 24.04.3 LTS

### Installation Method

None

### Steps to Reproduce

Minimal Repro Script
```
from strands import Agent                                                                                                              
  from strands_evals import Case, Experiment          
  from strands_evals.evaluators import GoalSuccessRateEvaluator                                                                          
  from strands_evals.mappers import StrandsInMemorySessionMapper
  from strands_evals.telemetry import StrandsEvalsTelemetry                                                                              
                                                                         
  telemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()
  memory_exporter = telemetry.in_memory_exporter

  def task_fn(case):
      memory_exporter.clear()
      agent = Agent(
          trace_attributes={"session.id": case.session_id, "gen_ai.conversation.id": case.session_id},
          callback_handler=None,
      )
      result = agent([
          {"document": {"name": "invoice", "format": "pdf", "source": {"bytes": open("invoice.pdf", "rb").read()}}},
          {"text": "Analyze this document"},
      ])
      mapper = StrandsInMemorySessionMapper()
      session = mapper.map_to_session(memory_exporter.get_finished_spans(), session_id=case.session_id)

      # Bug: session.traces[0].spans[0].user_prompt will be "" because
      # the mapper does content_list[0].get("text", "") and index 0 is the document block
      print(f"user_prompt: '{session.traces[0].spans[0].user_prompt}'")

      return {"output": str(result), "trajectory": session}

  experiment = Experiment(
      cases=[Case(input="Analyze this document")],
      evaluators=[GoalSuccessRateEvaluator()],
  )
  reports = experiment.run_evaluations(task=task_fn)
  reports[0].run_display()
```

### Expected Behavior

A multimodal (or any input prompt with multiple content blocks) should be analyzed by the evaluator.  Absent that behavior, common enterprise use cases like agentic document processing cannot be tested effectively because the evaluators don't see what the model saw so they can't effectively evaluate on Faithfulness, Goal Success etc...

### Actual Behavior

Because 
`strands_evals/mappers/strands_in_memory_session_mapper.py`'s helper function logic in _convert_agent_invocation_span which is invoked for each test case to convert the OTEL spans before the evaluators run uses the following logic:
```
for event in span.events:
                try:
                    event_attributes = event.attributes
                    if not event_attributes:
                        continue
                    if event.name == "gen_ai.user.message":
                        content_list = self._parse_json_attr(event_attributes, "content")
                        user_prompt = content_list[0].get("text", "") if content_list else ""
                    elif event.name == "gen_ai.choice":
                        msg = event_attributes.get("message", "") if event_attributes else ""
                        agent_response = str(msg)
                except Exception as e:
                    logger.warning(f"Failed to process agent event {event.name}: {e}")
```

The user_prompt behavior here explicitly targets the first content block in the message, and only if it's text.  So a message that starts with a document or an image returns blank here, and a message with multiple text content blocks will be truncated.

This is either a bug or an undocumented limitation.  It is also inconsistent logic inside the file.  Earlier on line 171, we use the somewhat better:
```
    def _process_user_message(self, content_list: list[dict[str, Any]]) -> list[TextContent | ToolResultContent]:
        return [TextContent(text=item["text"]) for item in content_list if "text" in item]
```
which effectively reduces the text content blocks in the message into a list (while strangely maintaining type support for ToolResults).  

### Additional Context

_No response_

### Possible Solution

There should be common logic for user prompt span parsing, and it should support more than the simple case of single text input.

```
def _reduce_content_blocks(self, content_list: list[dict[str, Any]]) -> list[TextContent | ToolResultContent]:                         
      """Reduce a list of Bedrock content blocks into TextContent entries.                                                               
                                                                                                                                         
      Handles text, document, image, and tool result blocks. Non-text content                                                            
      is represented as a descriptive placeholder so evaluators know it was present.                                                     
      """                                                                                                                                
      result: list[TextContent | ToolResultContent] = []
      for item in content_list:
          if "text" in item:
              result.append(TextContent(text=item["text"]))
          elif "document" in item:
              doc = item["document"]
              name = doc.get("name", "unknown")
              fmt = doc.get("format", "unknown")
              result.append(TextContent(text=f"[Document: {name}.{fmt}]"))
          elif "image" in item:
              img = item["image"]
              fmt = img.get("format", "unknown")
              result.append(TextContent(text=f"[Image: {fmt}]"))
          elif "toolResult" in item:
              tool_result = item["toolResult"]
              content = tool_result.get("content", [])
              text = content[0].get("text", "") if isinstance(content, list) and content else str(content)
              result.append(ToolResultContent(
                  content=text,
                  error=tool_result.get("error"),
                  tool_call_id=tool_result.get("toolUseId"),
              ))
      return result
```

### Related Issues

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] eval functions _convert_agent_invocation_span and _process_user_message do not support multimodal input #130

Checks

Strands Version

Strands Evals Version

Python Version

Operating System

Installation Method

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Possible Solution

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] eval functions _convert_agent_invocation_span and _process_user_message do not support multimodal input #130

Description

Checks

Strands Version

Strands Evals Version

Python Version

Operating System

Installation Method

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Possible Solution

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions