LlamaBench

LlamaBench is a comprehensive benchmarking framework for evaluating and comparing Large Language Models (LLMs). It provides an easy-to-use interface for running standardized tests across different models and generating detailed performance reports.

Features

Cloud and Local Model Support: Benchmark models from major providers like OpenAI and Anthropic, as well as local models using HuggingFace Transformers and LlamaCpp
Pre-defined Task Suites: Evaluate models on reasoning, coding, factual knowledge, safety, and more
Custom Task Creation: Easily define your own benchmark tasks with custom examples and evaluation metrics
Parallel Execution: Run benchmarks across multiple models simultaneously
Flexible Output Formats: Generate reports in JSON, CSV, Markdown, and HTML formats

Installation

pip install llamabench

Quick Start

from llamabench import run, ModelConfig
from llamabench.suites import get_suite

# Define models to benchmark
models = [
    ModelConfig(provider="openai", model="gpt-4-turbo", temperature=0.0),
    ModelConfig(provider="anthropic", model="claude-3-opus-20240229", temperature=0.0),
]

# Get a predefined benchmark suite
reasoning_suite = get_suite("reasoning")

# Run the benchmark
results = run(models=models, suite=reasoning_suite)

# Print results
print(results.summary())

Examples

Check out the examples directory for more usage examples:

Basic benchmark: Simple comparison of cloud models
Custom tasks: Creating your own benchmark tasks
Local models: Using HuggingFace and LlamaCpp models
CLI usage: Command line interface examples

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
src		src
LICENSE.md		LICENSE.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LlamaBench

Features

Installation

Quick Start

Examples

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

llamasearchai/llamabench

Folders and files

Latest commit

History

Repository files navigation

LlamaBench

Features

Installation

Quick Start

Examples

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages