Skip to content

Add LLM cost tracking #5

@DevilsAutumn

Description

@DevilsAutumn

Description

We should add support for LLM cost tracking as part of the benchmarking workflow. Right now, we only measure performance metrics (bert score, accuracy, etc.), but cost is an equally important factor when comparing different models and providers. Many teams choosing between models need to understand not only how well a model performs but also how much each request or benchmark run costs.

The idea is to automatically estimate the total and per-request cost based on the provider’s pricing (tokens in/out, per-second billing if applicable, etc.) and include this information in the final benchmark results.

What Needs to Be Done

  • Add a pricing module or integration for major providers (OpenAI, Anthropic, etc.).
  • Track consumed tokens (prompt + completion) or other billable units per request.
  • Calculate cost per request, per test case, and total benchmark cost.
  • Include cost details in the output (CLI, JSON, reports).
  • Add basic documentation explaining how cost is calculated and any assumptions made.

Why This Is Useful

This makes the benchmark results much more practical, especially for people choosing between multiple LLMs. Costs can vary significantly between models/providers, so having this information built-in makes the tool more complete and easier to use.

Potential dependencies:

Should We have Langchain integration before adding cost tracking?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions