Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
RESULTS.md	RESULTS.md
e2e_bench.sh	e2e_bench.sh
latency_bench	latency_bench
latency_bench.rs	latency_bench.rs
memory_bench.sh	memory_bench.sh
micro_bench	micro_bench
micro_bench.rs	micro_bench.rs

Alloy Gateway Benchmarks

This folder contains benchmarking tools and results for measuring the Alloy AI Gateway's performance overhead.

Quick Start

# 1. Start the Alloy server (from project root)
OPENAI_API_KEY=your-key cargo run --release --bin alloy -- --port 3000

# 2. Run the gateway latency benchmark
cd benchmarks
rustc -O latency_bench.rs -o latency_bench && ./latency_bench

# 3. Run the end-to-end comparison (requires OPENAI_API_KEY)
./e2e_bench.sh

Benchmark Scripts

1. `latency_bench.rs` - Gateway Latency Measurement

Measures the pure gateway overhead using persistent HTTP connections. This gives the most accurate measurement of how much latency Alloy adds.

rustc -O latency_bench.rs -o latency_bench
./latency_bench

What it measures:

TCP write (send request)
Gateway routing and processing
Response serialization
TCP read (receive response)

2. `e2e_bench.sh` - End-to-End Comparison

Compares direct OpenAI API calls vs requests through Alloy to measure real-world overhead.

export OPENAI_API_KEY=your-key
./e2e_bench.sh

3. `micro_bench.rs` - Component-Level Benchmarks

Measures individual operations (JSON parsing, serialization, lookups) to identify optimization opportunities.

rustc -O micro_bench.rs -o micro_bench
./micro_bench

4. `memory_bench.sh` - Memory Usage Benchmark

Measures memory footprint at rest and under load.

./memory_bench.sh

What it measures:

Binary size on disk
Baseline memory (RSS) at rest
Memory growth under concurrent load
Memory stability after chat requests

Results

Gateway Latency (Health Endpoint)

Metric	Value
Min	15.92 µs
Median	23.63 µs
Mean	25.29 µs
P95	37.79 µs
P99	56.96 µs
Max	121.33 µs
Throughput	39,540 req/s

End-to-End LLM Requests

Route	Avg Latency
Direct to OpenAI	~1,112 ms
Through Alloy	~817 ms
Gateway Overhead	~24 µs (0.003%)

Memory Usage

Metric	Value
Binary size	22.22 MB
Baseline memory (idle)	10.15 MB
Under load (1000 reqs)	~11.2 MB
Memory growth	~1 MB

Latency Breakdown

┌────────────────────────────────────────────────────────────────┐
│                  Typical LLM Request (~800ms)                   │
├────────────────────────────────────────────────────────────────┤
│ Alloy │  Network to OpenAI  │     OpenAI Processing            │
│ 24µs  │      ~200ms         │        ~600ms                    │
│ 0.003%│      (~25%)         │        (~75%)                    │
└────────────────────────────────────────────────────────────────┘

Interpreting Results

Gateway overhead is negligible (~24µs) compared to LLM API latency (~800ms)
The overhead percentage is < 0.01% of total request time
Network variance between runs is larger than the gateway overhead itself
The gateway achieves ~40K requests/second throughput for health checks

Test Environment

Hardware: Apple Silicon (M-series) / x86_64
OS: macOS / Linux
Rust: 1.75+ (release build with optimizations)
Server: Alloy running in release mode

Tips for Accurate Benchmarking

Always use release builds: cargo build --release
Warm up the server: Run a few hundred requests before measuring
Use connection reuse: Persistent connections eliminate TCP handshake overhead
Run multiple iterations: At least 1000+ for statistical significance
Minimize background processes: Close other applications during benchmarking

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Alloy Gateway Benchmarks

Quick Start

Benchmark Scripts

1. `latency_bench.rs` - Gateway Latency Measurement

2. `e2e_bench.sh` - End-to-End Comparison

3. `micro_bench.rs` - Component-Level Benchmarks

4. `memory_bench.sh` - Memory Usage Benchmark

Results

Gateway Latency (Health Endpoint)

End-to-End LLM Requests

Memory Usage

Latency Breakdown

Interpreting Results

Test Environment

Tips for Accurate Benchmarking

FilesExpand file tree

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

Alloy Gateway Benchmarks

Quick Start

Benchmark Scripts

1. latency_bench.rs - Gateway Latency Measurement

2. e2e_bench.sh - End-to-End Comparison

3. micro_bench.rs - Component-Level Benchmarks

4. memory_bench.sh - Memory Usage Benchmark

Results

Gateway Latency (Health Endpoint)

End-to-End LLM Requests

Memory Usage

Latency Breakdown

Interpreting Results

Test Environment

Tips for Accurate Benchmarking

1. `latency_bench.rs` - Gateway Latency Measurement

2. `e2e_bench.sh` - End-to-End Comparison

3. `micro_bench.rs` - Component-Level Benchmarks

4. `memory_bench.sh` - Memory Usage Benchmark