Add LLM cost tracking

Description

We should add support for LLM cost tracking as part of the benchmarking workflow. Right now, we only measure performance metrics (bert score, accuracy, etc.), but cost is an equally important factor when comparing different models and providers. Many teams choosing between models need to understand not only how well a model performs but also how much each request or benchmark run costs.

The idea is to automatically estimate the total and per-request cost based on the provider’s pricing (tokens in/out, per-second billing if applicable, etc.) and include this information in the final benchmark results.

What Needs to Be Done

- Add a pricing module or integration for major providers (OpenAI, Anthropic, etc.).
- Track consumed tokens (prompt + completion) or other billable units per request.
- Calculate cost per request, per test case, and total benchmark cost.
- Include cost details in the output (CLI, JSON, reports).
- Add basic documentation explaining how cost is calculated and any assumptions made.

Why This Is Useful

This makes the benchmark results much more practical, especially for people choosing between multiple LLMs. Costs can vary significantly between models/providers, so having this information built-in makes the tool more complete and easier to use.

Potential dependencies:

Should We have Langchain integration before adding cost tracking?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLM cost tracking #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add LLM cost tracking #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions