-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Description
We should add support for LLM cost tracking as part of the benchmarking workflow. Right now, we only measure performance metrics (bert score, accuracy, etc.), but cost is an equally important factor when comparing different models and providers. Many teams choosing between models need to understand not only how well a model performs but also how much each request or benchmark run costs.
The idea is to automatically estimate the total and per-request cost based on the provider’s pricing (tokens in/out, per-second billing if applicable, etc.) and include this information in the final benchmark results.
What Needs to Be Done
- Add a pricing module or integration for major providers (OpenAI, Anthropic, etc.).
- Track consumed tokens (prompt + completion) or other billable units per request.
- Calculate cost per request, per test case, and total benchmark cost.
- Include cost details in the output (CLI, JSON, reports).
- Add basic documentation explaining how cost is calculated and any assumptions made.
Why This Is Useful
This makes the benchmark results much more practical, especially for people choosing between multiple LLMs. Costs can vary significantly between models/providers, so having this information built-in makes the tool more complete and easier to use.
Potential dependencies:
Should We have Langchain integration before adding cost tracking?