TTBench: LLM Benchmark for Test-Time-Compute

This is a repository for TTBench, the Test-Time Compute Benchmark.

This benchmark features CoT queried for multiple LLMs on a variety of mathematical and reasoning datasets. The few-shot query process and answer extraction are standardised for every dataset, which eases the burden on researchers in terms of time and money.

Installation

Please, install this becnhmark from the source.

pip install .

It requires api_responses.zip (download from Google Drive) file containing a database. For the following example, let us assume this file is in your code directory.

Example

from ttbench import load, DatasetType, LLMType
    
dataset, [llm1,llm2] = load(DatasetType.SVAMP, [LLMType.LLaMA3B32, LLMType.Qwen72B25], api_path="api_responses.zip")

for question_id, dataentry in dataset:
    print("Question: ", dataentry.question)
    print("True answer: ", dataentry.answer)
    llm1_response = llm1(question_id, N=20)
    print("Cost: $", llm1_response.cost)
    print("1st CoT answer: ",  llm1_response.cots[0].answer)

Refer to examples folder for more examples of the benchmark evaluation

Cost modelling

We also provide a procedure to model the dollar cost for each query. This ensures the fair comparison between test-time-compute methods.

from ttbench import load, DatasetType, LLMType

dataset, [llm, ] = load(DatasetType.CommonsenseQA, [LLMType.Mixtral8x7B], api_path="api_responses.zip")

question_id = 42
response = llm(question_id, N=2)

print(f"Request processing cost: ${response.request.cost:0.9f}")
print(f"First CoT response cost: ${response.cots[0].metadata.cost:0.9f}")
print(f"Total LLM query cost: ${response.cost:0.9f}")

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
ttbench		ttbench
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TTBench: LLM Benchmark for Test-Time-Compute

Installation

Example

Cost modelling

About

Uh oh!

Releases

Packages

Languages

License

networkslab/ttbench

Folders and files

Latest commit

History

Repository files navigation

TTBench: LLM Benchmark for Test-Time-Compute

Installation

Example

Cost modelling

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages