This folder contains benchmarking tools and results for measuring the Alloy AI Gateway's performance overhead.
# 1. Start the Alloy server (from project root)
OPENAI_API_KEY=your-key cargo run --release --bin alloy -- --port 3000
# 2. Run the gateway latency benchmark
cd benchmarks
rustc -O latency_bench.rs -o latency_bench && ./latency_bench
# 3. Run the end-to-end comparison (requires OPENAI_API_KEY)
./e2e_bench.shMeasures the pure gateway overhead using persistent HTTP connections. This gives the most accurate measurement of how much latency Alloy adds.
rustc -O latency_bench.rs -o latency_bench
./latency_benchWhat it measures:
- TCP write (send request)
- Gateway routing and processing
- Response serialization
- TCP read (receive response)
Compares direct OpenAI API calls vs requests through Alloy to measure real-world overhead.
export OPENAI_API_KEY=your-key
./e2e_bench.shMeasures individual operations (JSON parsing, serialization, lookups) to identify optimization opportunities.
rustc -O micro_bench.rs -o micro_bench
./micro_benchMeasures memory footprint at rest and under load.
./memory_bench.shWhat it measures:
- Binary size on disk
- Baseline memory (RSS) at rest
- Memory growth under concurrent load
- Memory stability after chat requests
| Metric | Value |
|---|---|
| Min | 15.92 µs |
| Median | 23.63 µs |
| Mean | 25.29 µs |
| P95 | 37.79 µs |
| P99 | 56.96 µs |
| Max | 121.33 µs |
| Throughput | 39,540 req/s |
| Route | Avg Latency |
|---|---|
| Direct to OpenAI | ~1,112 ms |
| Through Alloy | ~817 ms |
| Gateway Overhead | ~24 µs (0.003%) |
| Metric | Value |
|---|---|
| Binary size | 22.22 MB |
| Baseline memory (idle) | 10.15 MB |
| Under load (1000 reqs) | ~11.2 MB |
| Memory growth | ~1 MB |
┌────────────────────────────────────────────────────────────────┐
│ Typical LLM Request (~800ms) │
├────────────────────────────────────────────────────────────────┤
│ Alloy │ Network to OpenAI │ OpenAI Processing │
│ 24µs │ ~200ms │ ~600ms │
│ 0.003%│ (~25%) │ (~75%) │
└────────────────────────────────────────────────────────────────┘
- Gateway overhead is negligible (~24µs) compared to LLM API latency (~800ms)
- The overhead percentage is < 0.01% of total request time
- Network variance between runs is larger than the gateway overhead itself
- The gateway achieves ~40K requests/second throughput for health checks
- Hardware: Apple Silicon (M-series) / x86_64
- OS: macOS / Linux
- Rust: 1.75+ (release build with optimizations)
- Server: Alloy running in release mode
- Always use release builds:
cargo build --release - Warm up the server: Run a few hundred requests before measuring
- Use connection reuse: Persistent connections eliminate TCP handshake overhead
- Run multiple iterations: At least 1000+ for statistical significance
- Minimize background processes: Close other applications during benchmarking