Skip to content

llamasearchai/llamabench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LlamaBench

LlamaBench is a comprehensive benchmarking framework for evaluating and comparing Large Language Models (LLMs). It provides an easy-to-use interface for running standardized tests across different models and generating detailed performance reports.

Features

  • Cloud and Local Model Support: Benchmark models from major providers like OpenAI and Anthropic, as well as local models using HuggingFace Transformers and LlamaCpp
  • Pre-defined Task Suites: Evaluate models on reasoning, coding, factual knowledge, safety, and more
  • Custom Task Creation: Easily define your own benchmark tasks with custom examples and evaluation metrics
  • Parallel Execution: Run benchmarks across multiple models simultaneously
  • Flexible Output Formats: Generate reports in JSON, CSV, Markdown, and HTML formats

Installation

pip install llamabench

Quick Start

from llamabench import run, ModelConfig
from llamabench.suites import get_suite

# Define models to benchmark
models = [
    ModelConfig(provider="openai", model="gpt-4-turbo", temperature=0.0),
    ModelConfig(provider="anthropic", model="claude-3-opus-20240229", temperature=0.0),
]

# Get a predefined benchmark suite
reasoning_suite = get_suite("reasoning")

# Run the benchmark
results = run(models=models, suite=reasoning_suite)

# Print results
print(results.summary())

Examples

Check out the examples directory for more usage examples:

License

MIT

About

A comprehensive benchmarking framework for AI models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages