English | 中文
gollmperf is a professional-grade Large Language Model (LLM) API performance testing tool designed to provide accurate and comprehensive LLM performance evaluation. The tool supports multiple LLM providers, offers multi-dimensional performance metrics, and features enterprise-level testing capabilities.
- Throughput Testing: QPS (Queries Per Second) measurement
- Latency Testing: TTFT (Time To First Token), response latency, P50/P90/P99 percentiles
- Quality Testing: Output quality assessment (optional)
- Stability Testing: Long-term runtime stability verification
- OpenAI (GPT series)
- Qwen (Tongyi Qianwen series)
- Custom API endpoints
- Basic Testing: Standard performance testing
- Stress Testing: Gradually increase load until system limits
- Performance Testing: Run tests across multiple concurrency levels to find optimal performance parameters
- Stability Testing: Long-term continuous runtime testing
- Comparative Testing: Multi-model performance comparison
- Scenario Testing: Specific business scenario simulation
- TTFT (Time To First Token): First token latency
- TPS (Tokens Per Second): Tokens generated per second
- Success Rate: Request success rate statistics
- Error Analysis: Detailed error type and distribution
- Real-time console output
- JSON detailed data
- CSV tabular data
- HTML visualization reports
-
Test Engine
- Executes various test tasks
- Precisely controls concurrent requests and load
- Warm-up phase ensures test accuracy
-
Data Collector
- Collects performance data and metrics
- Manages test result storage
-
Statistical Analyzer
- Calculates various performance metrics
- Generates statistical data
-
Report Generator
- Generates reports in multiple formats
- Provides visualization display
-
Configuration Manager
- Manages test configurations and parameters
- Supports YAML configuration files
-
Provider Interface
- Unifies interfaces for different LLM providers
- Supports extension for new providers
- Programming Language: Go (High performance, excellent concurrency support)
- Concurrency Model: goroutine + channel
- HTTP Client: Standard library net/http
- CLI Framework: Cobra
- Configuration Management: Viper + YAML
- Data Format: JSON, JSONL
gollmperf/
├── cmd/ # Command-line interface
├── configs/ # Configuration file examples
├── examples/ # Sample data
├── internal/ # Core modules
│ ├── engine/ # Test engine
│ ├── collector/ # Data collector
│ ├── analyzer/ # Statistical analyzer
│ ├── reporter/ # Report generator
│ ├── config/ # Configuration management
│ ├── provider/ # Provider interface
│ └── utils/ # Utility functions
├── docs/ # Documentation
└── main.go # Main program entry
# Run stress testing testing
./gollmperf run --config ./configs/example.yaml # Run batch testing with --batch flag
./gollmperf run --config ./configs/example.yaml --batch# Run performance test mode with --perf flag
./gollmperf run --perf --config ./configs/example.yamlIn performance testing mode, the tool will run tests across multiple concurrency levels defined in the perf_concurrency_group configuration parameter to find optimal performance parameters.
./gollmperf run -h
-k, --apikey string API key
-b, --batch Run batch mode, for run all case in dataset
--batch-result string Batch results file path (output batch results to JSONL file)
-d, --dataset string Dataset file path
-e, --endpoint string Endpoint
-f, --format string Report format (json, csv, html) (default as report file extension)
-m, --model string Model name
-p, --perf Run perf mode, for find performance limits in different concurrency levels
-P, --provider string LLM provider (openai, qwen, etc.) (default "openai")
-r, --report string Report file path (output report to file)# Command args override config file fields
./gollmperf run --config ./configs/example.yaml --model gpt-3.5-turbo --dataset ./examples/test_cases.jsonl --report result.json --format json# Run batch test and save results to JSONL file
./gollmperf run --config ./configs/example.yaml --batch --batch-result ./others/batch_results.jsonlgit clone https://github.com/FortuneW/gollmperf.git
cd gollmperf
go mod tidy
go build
export LLM_API_ENDPOINT="https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions"
export LLM_API_KEY="sk-xxx"
export LLM_MODEL_NAME="qwen-plus"
./gollmperf run -c configs/example.yamlThe results are as follows:
[INF] 2025-08-06T11:13:15.830Z [reporter] ========== gollmperf Performance Report ==========
[INF] 2025-08-06T11:13:15.830Z [reporter] Total Duration: 4.212978738s
[INF] 2025-08-06T11:13:15.830Z [reporter] Total Requests: 5
[INF] 2025-08-06T11:13:15.830Z [reporter] Successful Requests: 5
[INF] 2025-08-06T11:13:15.830Z [reporter] Failed Requests: 0
[INF] 2025-08-06T11:13:15.830Z [reporter] Success Rate: 100.00%
[INF] 2025-08-06T11:13:15.830Z [reporter] QPS: 1.19
[INF] 2025-08-06T11:13:15.830Z [reporter] Tokens per second: 197.01
[INF] 2025-08-06T11:13:15.830Z [reporter] Average Latency: 3.650148772s
[INF] 2025-08-06T11:13:15.830Z [reporter] Latency P50: 3.822069212s
[INF] 2025-08-06T11:13:15.830Z [reporter] Latency P90: 4.212863638s
[INF] 2025-08-06T11:13:15.830Z [reporter] Latency P99: 4.212863638s
[INF] 2025-08-06T11:13:15.830Z [reporter] Average Request Tokens: 22.00
[INF] 2025-08-06T11:13:15.830Z [reporter] Average Response Tokens: 144.00
[INF] 2025-08-06T11:13:15.830Z [reporter] Average First Token Latency: 488.073612ms
[INF] 2025-08-06T11:13:15.830Z [reporter] First Token Latency P50: 512.180236ms
[INF] 2025-08-06T11:13:15.830Z [reporter] First Token Latency P90: 583.377086ms
[INF] 2025-08-06T11:13:15.830Z [reporter] First Token Latency P99: 583.377086msBy default, a ./results/report.html report file will be generated in the current directory
# Example configuration for gollmperf
# Test configuration
test:
# Duration of the test
duration: 60s
# Warmup period before starting measurements
warmup: 10s
# Number of concurrent requests
concurrency: 10
# Timeout for each request
timeout: 30s
# Concurrency levels for performance testing mode
perf_concurrency_group: [1, 2, 4, 8, 16, 20, 32, 40, 48, 64]
# Model configuration
model:
# Model name
name: ${LLM_MODEL_NAME}
# Provider (openai, qwen, etc.)
provider: openai
# API endpoint (optional, uses default if not specified)
endpoint: ${LLM_API_ENDPOINT}
# API key (required)
api_key: ${LLM_API_KEY}
# http headers, with any additional header fields
headers:
Content-Type: application/json
# http request params template, with any additional fields
params_template:
stream: true
stream_options:
include_usage: true
extra_body:
enable_thinking: false
# Dataset configuration
dataset:
# Type of dataset (jsonl, etc.)
type: jsonl
# Path to dataset file
path: ./examples/test_cases.jsonl
# Output configuration
output:
# Output formats (json, csv, html)
format: html
# Output file path
path: ./results/report.html
# Batch testing result file path (for saving batch test results in JSONL format)
batch_result_path: ./results/batch_results.jsonl- Microsecond-level timing accuracy
- Network latency separation measurement
- GC impact exclusion
- Precise concurrency control
- QPS target control
- Adaptive load regulation
- Response content validation
- Token count accuracy
- Error classification statistics
- Multi-user support
- Permission control
- Log auditing
- Cluster deployment support
The project has completed core functionality development, including:
- ✅ Project structure and basic framework
- ✅ Configuration management module
- ✅ Test execution module
- ✅ JSONL batch testing functionality
- ✅ Statistical analysis module
- ✅ Report output module (console, JSON, CSV, HTML)
- ✅ Command-line interface with flexible parameter configuration
- ✅ Basic testing and validation
- ✅ OpenAI provider implementation
- ✅ Qwen provider implementation
- ✅ Comprehensive metrics collection with error categorization
- ✅ Batch testing
- ✅ Stress testing
- ✅ Performance testing mode
- Performance Optimization: Further enhance the testing tool's own performance
- Feature Expansion: Add more testing modes and metrics
- Provider Support: Increase support for more LLM providers
- Visualization Enhancement: Provide richer charts and dashboards
- Enterprise Features: Add user management, permission control, and other enterprise-level features
- Stress Testing Implementation: Complete the stress testing mode
- Comparative Testing Implementation: Add multi-model performance comparison capabilities
gollmperf provides a professional, accurate, and user-friendly solution for LLM performance testing. Through modular design and a clear architecture, the tool not only meets current testing requirements but also has good extensibility to adapt to future testing scenarios and needs.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

