Feature Request: Support multimodal models in correctness test

Hi team, thanks for the great work on `llmperf` — it's been super helpful for benchmarking LLM APIs.

I’m wondering if there are any plans to extend the **correctness test** framework to support **multimodal models**, e.g., models that accept both text and image inputs (like Qwen2-VL-7B-Instruct, glm-4v-9b, etc.).

This would be especially useful for evaluating models on tasks like OCR, image-to-text, or visual question answering.

Would love to hear your thoughts!

Thanks 🙏


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Support multimodal models in correctness test #88

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Support multimodal models in correctness test #88

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions