A high-performance, unified AI Gateway built in Rust. Alloy normalizes requests across multiple LLM providers (OpenAI, Anthropic, AWS Bedrock), manages traffic governance (concurrency, retries, load balancing), and integrates with the Model Context Protocol (MCP) for tooling.
Inspired by LiteLLM and Bifrost — but written in Rust and optimized for heavy performance workloads.
- Multi-Provider Support: Route requests to OpenAI, Anthropic, or AWS Bedrock
- OpenAI-Compatible API: Drop-in replacement for OpenAI API clients
- Streaming Support: Full SSE streaming for chat completions
- Traffic Governance: Concurrency limits, retry with exponential backoff, failover
- MCP Integration: Tool orchestration via Model Context Protocol
- Observability: Structured logging with tracing
Alloy is designed for minimal overhead. Benchmarks show negligible impact on request latency.
| Metric | Value |
|---|---|
| Min | 15.92 µs |
| Median | 23.63 µs |
| P99 | 56.96 µs |
| Throughput | 39,540 req/s |
For a typical LLM request (~800ms), the gateway adds 0.003% overhead.
| Metric | Value |
|---|---|
| Binary size | 22.22 MB |
| Baseline memory (idle) | ~10 MB |
| Under load (1000 reqs) | ~11 MB |
See the benchmarks/ folder for detailed benchmarking scripts and results.
- Rust 2021 edition
- API keys for your desired providers
cargo build --release
Alloy can be configured via:
- Configuration file (
config.toml) - Environment variables (e.g.,
OPENAI_API_KEY)
Example environment setup:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export AWS_REGION="us-east-1"
# Using environment variables
cargo run --bin alloy
# With custom port and config
cargo run --bin alloy -- --port 8080 --config custom-config.toml
# View all options
cargo run --bin alloy -- --help
curl http://localhost:3000/health
curl http://localhost:3000/ready
curl http://localhost:3000/v1/models
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
alloy/
├── alloy-core/ # Domain entities, schemas, and traits
├── alloy-providers/ # Adapter implementations for LLM APIs
├── alloy-governance/ # Traffic control, load balancing, resilience
├── alloy-mcp/ # Model Context Protocol implementation
└── alloy-server/ # HTTP server entry point and routing
| Crate | Description |
|---|---|
alloy-core |
Unified schemas (ChatRequest, ChatResponse), error types, and LLMProvider trait |
alloy-providers |
OpenAI, Anthropic, and Bedrock provider implementations |
alloy-governance |
Concurrency limiting, retry policies, and failover logic |
alloy-mcp |
MCP tool definitions, Stdio/SSE transports, orchestration |
alloy-server |
Axum HTTP server, routing, handlers, configuration |
[server]
timeout_secs = 120
max_body_size = 10485760 # 10MB
[providers.openai]
api_key = "sk-..."
timeout_secs = 60
max_concurrent = 10
[providers.anthropic]
api_key = "sk-ant-..."
timeout_secs = 60
max_concurrent = 10
[providers.bedrock]
region = "us-east-1"
profile = "default"
max_concurrent = 10
[governance]
max_retries = 3
failover_enabled = true
fallback_order = ["anthropic", "bedrock"]
| Flag | Default | Description |
|---|---|---|
-p, --port |
3000 | Server port |
-c, --config |
config.toml | Config file path |
--log-level |
info | Logging level |
--json-logs |
false | JSON log format |
--metrics |
true | Enable metrics |
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key |
ANTHROPIC_API_KEY |
Anthropic API key |
AWS_REGION |
AWS region for Bedrock |
AWS_PROFILE |
AWS profile name |
ALLOY_PORT |
Server port |
ALLOY_CONFIG |
Config file path |
ALLOY_LOG_LEVEL |
Log level |
MIT