gollmperf - Professional LLM API Performance Testing Tool

English | 中文

Project Overview

gollmperf is a professional-grade Large Language Model (LLM) API performance testing tool designed to provide accurate and comprehensive LLM performance evaluation. The tool supports multiple LLM providers, offers multi-dimensional performance metrics, and features enterprise-level testing capabilities.

Core Features

1. Multi-dimensional Performance Testing

Throughput Testing: QPS (Queries Per Second) measurement
Latency Testing: TTFT (Time To First Token), response latency, P50/P90/P99 percentiles
Quality Testing: Output quality assessment (optional)
Stability Testing: Long-term runtime stability verification

2. Multi-model Support

OpenAI (GPT series)
Qwen (Tongyi Qianwen series)
Custom API endpoints

3. Advanced Testing Modes

Basic Testing: Standard performance testing
Stress Testing: Gradually increase load until system limits
Performance Testing: Run tests across multiple concurrency levels to find optimal performance parameters
Stability Testing: Long-term continuous runtime testing
Comparative Testing: Multi-model performance comparison
Scenario Testing: Specific business scenario simulation

4. Professional Metrics

TTFT (Time To First Token): First token latency
TPS (Tokens Per Second): Tokens generated per second
Success Rate: Request success rate statistics
Error Analysis: Detailed error type and distribution

5. Diverse Report Output

Real-time console output
JSON detailed data
CSV tabular data
HTML visualization reports

Technical Architecture Design

Core Modules

Test Engine
- Executes various test tasks
- Precisely controls concurrent requests and load
- Warm-up phase ensures test accuracy
Data Collector
- Collects performance data and metrics
- Manages test result storage
Statistical Analyzer
- Calculates various performance metrics
- Generates statistical data
Report Generator
- Generates reports in multiple formats
- Provides visualization display
Configuration Manager
- Manages test configurations and parameters
- Supports YAML configuration files
Provider Interface
- Unifies interfaces for different LLM providers
- Supports extension for new providers

Technology Stack

Programming Language: Go (High performance, excellent concurrency support)
Concurrency Model: goroutine + channel
HTTP Client: Standard library net/http
CLI Framework: Cobra
Configuration Management: Viper + YAML
Data Format: JSON, JSONL

Project Structure

gollmperf/
├── cmd/                 # Command-line interface
├── configs/             # Configuration file examples
├── examples/            # Sample data
├── internal/            # Core modules
│   ├── engine/          # Test engine
│   ├── collector/       # Data collector
│   ├── analyzer/        # Statistical analyzer
│   ├── reporter/        # Report generator
│   ├── config/          # Configuration management
│   ├── provider/        # Provider interface
│   └── utils/           # Utility functions
├── docs/                # Documentation
└── main.go              # Main program entry

Usage

Stress Testing

# Run stress testing testing
./gollmperf run --config ./configs/example.yaml

Batch Testing

# Run batch testing with --batch flag
./gollmperf run --config ./configs/example.yaml --batch

Performance Testing

# Run performance test mode with --perf flag
./gollmperf run --perf --config ./configs/example.yaml

In performance testing mode, the tool will run tests across multiple concurrency levels defined in the perf_concurrency_group configuration parameter to find optimal performance parameters.

Command args can override config file fields

./gollmperf run -h

  -k, --apikey string          API key
  -b, --batch                  Run batch mode, for run all case in dataset
      --batch-result string    Batch results file path (output batch results to JSONL file)
  -d, --dataset string         Dataset file path
  -e, --endpoint string        Endpoint
  -f, --format string          Report format (json, csv, html) (default as report file extension)
  -m, --model string           Model name
  -p, --perf                   Run perf mode, for find performance limits in different concurrency levels
  -P, --provider string        LLM provider (openai, qwen, etc.) (default "openai")
  -r, --report string          Report file path (output report to file)

# Command args override config file fields
./gollmperf run --config ./configs/example.yaml --model gpt-3.5-turbo --dataset ./examples/test_cases.jsonl --report result.json --format json

Batch Results Output

# Run batch test and save results to JSONL file
./gollmperf run --config ./configs/example.yaml --batch --batch-result ./others/batch_results.jsonl

Usage Examples

git clone https://github.com/FortuneW/gollmperf.git

cd gollmperf
go mod tidy
go build

export LLM_API_ENDPOINT="https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions"
export LLM_API_KEY="sk-xxx"
export LLM_MODEL_NAME="qwen-plus"

./gollmperf run -c configs/example.yaml

The results are as follows:

[INF] 2025-08-06T11:13:15.830Z [reporter] ========== gollmperf Performance Report ==========
[INF] 2025-08-06T11:13:15.830Z [reporter] Total Duration: 4.212978738s
[INF] 2025-08-06T11:13:15.830Z [reporter] Total Requests: 5
[INF] 2025-08-06T11:13:15.830Z [reporter] Successful Requests: 5
[INF] 2025-08-06T11:13:15.830Z [reporter] Failed Requests: 0
[INF] 2025-08-06T11:13:15.830Z [reporter] Success Rate: 100.00%
[INF] 2025-08-06T11:13:15.830Z [reporter] QPS: 1.19
[INF] 2025-08-06T11:13:15.830Z [reporter] Tokens per second: 197.01
[INF] 2025-08-06T11:13:15.830Z [reporter] Average Latency: 3.650148772s
[INF] 2025-08-06T11:13:15.830Z [reporter] Latency P50: 3.822069212s
[INF] 2025-08-06T11:13:15.830Z [reporter] Latency P90: 4.212863638s
[INF] 2025-08-06T11:13:15.830Z [reporter] Latency P99: 4.212863638s
[INF] 2025-08-06T11:13:15.830Z [reporter] Average Request Tokens: 22.00
[INF] 2025-08-06T11:13:15.830Z [reporter] Average Response Tokens: 144.00
[INF] 2025-08-06T11:13:15.830Z [reporter] Average First Token Latency: 488.073612ms
[INF] 2025-08-06T11:13:15.830Z [reporter] First Token Latency P50: 512.180236ms
[INF] 2025-08-06T11:13:15.830Z [reporter] First Token Latency P90: 583.377086ms
[INF] 2025-08-06T11:13:15.830Z [reporter] First Token Latency P99: 583.377086ms

By default, a ./results/report.html report file will be generated in the current directory

Configuration Example

# Example configuration for gollmperf

# Test configuration
test:
  # Duration of the test
  duration: 60s
  
  # Warmup period before starting measurements
  warmup: 10s
  
  # Number of concurrent requests
  concurrency: 10
  
  # Timeout for each request
  timeout: 30s
  
  # Concurrency levels for performance testing mode
  perf_concurrency_group: [1, 2, 4, 8, 16, 20, 32, 40, 48, 64]

# Model configuration
model:
  # Model name
  name: ${LLM_MODEL_NAME}
  
  # Provider (openai, qwen, etc.)
  provider: openai
  
  # API endpoint (optional, uses default if not specified)
  endpoint: ${LLM_API_ENDPOINT}

  # API key (required)
  api_key: ${LLM_API_KEY}

  # http headers, with any additional header fields
  headers:
    Content-Type: application/json

  # http request params template, with any additional fields
  params_template:
    stream: true
    stream_options:
      include_usage: true
    extra_body:
      enable_thinking: false

# Dataset configuration
dataset:
  # Type of dataset (jsonl, etc.)
  type: jsonl
  
  # Path to dataset file
  path: ./examples/test_cases.jsonl

# Output configuration
output:
  # Output formats (json, csv, html)
  format: html
    
  # Output file path
  path: ./results/report.html
  
  # Batch testing result file path (for saving batch test results in JSONL format)
  batch_result_path: ./results/batch_results.jsonl

Professional Features

1. Precise Timing

Microsecond-level timing accuracy
Network latency separation measurement
GC impact exclusion

2. Load Control

Precise concurrency control
QPS target control
Adaptive load regulation

3. Data Validation

Response content validation
Token count accuracy
Error classification statistics

4. Enterprise Features

Multi-user support
Permission control
Log auditing
Cluster deployment support

Development Status

The project has completed core functionality development, including:

✅ Project structure and basic framework
✅ Configuration management module
✅ Test execution module
✅ JSONL batch testing functionality
✅ Statistical analysis module
✅ Report output module (console, JSON, CSV, HTML)
✅ Command-line interface with flexible parameter configuration
✅ Basic testing and validation
✅ OpenAI provider implementation
✅ Qwen provider implementation
✅ Comprehensive metrics collection with error categorization
✅ Batch testing
✅ Stress testing
✅ Performance testing mode

Future Optimization Directions

Performance Optimization: Further enhance the testing tool's own performance
Feature Expansion: Add more testing modes and metrics
Provider Support: Increase support for more LLM providers
Visualization Enhancement: Provide richer charts and dashboards
Enterprise Features: Add user management, permission control, and other enterprise-level features
Stress Testing Implementation: Complete the stress testing mode
Comparative Testing Implementation: Add multi-model performance comparison capabilities

Summary

gollmperf provides a professional, accurate, and user-friendly solution for LLM performance testing. Through modular design and a clear architecture, the tool not only meets current testing requirements but also has good extensibility to adapt to future testing scenarios and needs.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
cmd		cmd
configs		configs
docs/assets		docs/assets
examples		examples
internal		internal
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
README_zh.md		README_zh.md
build.sh		build.sh
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gollmperf - Professional LLM API Performance Testing Tool

Project Overview

Core Features

1. Multi-dimensional Performance Testing

2. Multi-model Support

3. Advanced Testing Modes

4. Professional Metrics

5. Diverse Report Output

Technical Architecture Design

Core Modules

Technology Stack

Project Structure

Usage

Stress Testing

Batch Testing

Performance Testing

Command args can override config file fields

Batch Results Output

Usage Examples

Configuration Example

Professional Features

1. Precise Timing

2. Load Control

3. Data Validation

4. Enterprise Features

Development Status

Future Optimization Directions

Summary

License

About

Uh oh!

Releases 6

Packages

Contributors 2

Uh oh!

Languages

License

FortuneW/gollmperf

Folders and files

Latest commit

History

Repository files navigation

gollmperf - Professional LLM API Performance Testing Tool

Project Overview

Core Features

1. Multi-dimensional Performance Testing

2. Multi-model Support

3. Advanced Testing Modes

4. Professional Metrics

5. Diverse Report Output

Technical Architecture Design

Core Modules

Technology Stack

Project Structure

Usage

Stress Testing

Batch Testing

Performance Testing

Command args can override config file fields

Batch Results Output

Usage Examples

Configuration Example

Professional Features

1. Precise Timing

2. Load Control

3. Data Validation

4. Enterprise Features

Development Status

Future Optimization Directions

Summary

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 2

Uh oh!

Languages

Packages