Alloy - AI Gateway & Orchestration Layer

A high-performance, unified AI Gateway built in Rust. Alloy normalizes requests across multiple LLM providers (OpenAI, Anthropic, AWS Bedrock), manages traffic governance (concurrency, retries, load balancing), and integrates with the Model Context Protocol (MCP) for tooling.

Inspired by LiteLLM and Bifrost — but written in Rust and optimized for heavy performance workloads.

Features

Multi-Provider Support: Route requests to OpenAI, Anthropic, or AWS Bedrock
OpenAI-Compatible API: Drop-in replacement for OpenAI API clients
Streaming Support: Full SSE streaming for chat completions
Traffic Governance: Concurrency limits, retry with exponential backoff, failover
MCP Integration: Tool orchestration via Model Context Protocol
Observability: Structured logging with tracing

Performance

Alloy is designed for minimal overhead. Benchmarks show negligible impact on request latency.

Latency Overhead

Metric	Value
Min	15.92 µs
Median	23.63 µs
P99	56.96 µs
Throughput	39,540 req/s

For a typical LLM request (~800ms), the gateway adds 0.003% overhead.

Memory Usage

Metric	Value
Binary size	22.22 MB
Baseline memory (idle)	~10 MB
Under load (1000 reqs)	~11 MB

See the benchmarks/ folder for detailed benchmarking scripts and results.

Quick Start

Prerequisites

Rust 2021 edition
API keys for your desired providers

Installation

cargo build --release

Configuration

Alloy can be configured via:

Configuration file (config.toml)
Environment variables (e.g., OPENAI_API_KEY)

Example environment setup:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export AWS_REGION="us-east-1"

Running the Server

# Using environment variables
cargo run --bin alloy

# With custom port and config
cargo run --bin alloy -- --port 8080 --config custom-config.toml

# View all options
cargo run --bin alloy -- --help

API Endpoints

Health Check

curl http://localhost:3000/health

Readiness Check

curl http://localhost:3000/ready

List Models

curl http://localhost:3000/v1/models

Chat Completions (OpenAI-compatible)

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Streaming

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Architecture

alloy/
├── alloy-core/         # Domain entities, schemas, and traits
├── alloy-providers/    # Adapter implementations for LLM APIs
├── alloy-governance/   # Traffic control, load balancing, resilience
├── alloy-mcp/          # Model Context Protocol implementation
└── alloy-server/       # HTTP server entry point and routing

Crate Overview

Crate	Description
`alloy-core`	Unified schemas (`ChatRequest`, `ChatResponse`), error types, and `LLMProvider` trait
`alloy-providers`	OpenAI, Anthropic, and Bedrock provider implementations
`alloy-governance`	Concurrency limiting, retry policies, and failover logic
`alloy-mcp`	MCP tool definitions, Stdio/SSE transports, orchestration
`alloy-server`	Axum HTTP server, routing, handlers, configuration

Configuration Reference

config.toml

[server]
timeout_secs = 120
max_body_size = 10485760  # 10MB

[providers.openai]
api_key = "sk-..."
timeout_secs = 60
max_concurrent = 10

[providers.anthropic]
api_key = "sk-ant-..."
timeout_secs = 60
max_concurrent = 10

[providers.bedrock]
region = "us-east-1"
profile = "default"
max_concurrent = 10

[governance]
max_retries = 3
failover_enabled = true
fallback_order = ["anthropic", "bedrock"]

CLI Arguments

Flag	Default	Description
`-p, --port`	3000	Server port
`-c, --config`	config.toml	Config file path
`--log-level`	info	Logging level
`--json-logs`	false	JSON log format
`--metrics`	true	Enable metrics

Environment Variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key
`ANTHROPIC_API_KEY`	Anthropic API key
`AWS_REGION`	AWS region for Bedrock
`AWS_PROFILE`	AWS profile name
`ALLOY_PORT`	Server port
`ALLOY_CONFIG`	Config file path
`ALLOY_LOG_LEVEL`	Log level

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
.idea		.idea
alloy-core		alloy-core
alloy-governance		alloy-governance
alloy-mcp		alloy-mcp
alloy-providers		alloy-providers
alloy-server		alloy-server
benchmarks		benchmarks
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
config.toml		config.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alloy - AI Gateway & Orchestration Layer

Features

Performance

Latency Overhead

Memory Usage

Quick Start

Prerequisites

Installation

Configuration

Running the Server

API Endpoints

Health Check

Readiness Check

List Models

Chat Completions (OpenAI-compatible)

Streaming

Architecture

Crate Overview

Configuration Reference

config.toml

CLI Arguments

Environment Variables

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Alloy - AI Gateway & Orchestration Layer

Features

Performance

Latency Overhead

Memory Usage

Quick Start

Prerequisites

Installation

Configuration

Running the Server

API Endpoints

Health Check

Readiness Check

List Models

Chat Completions (OpenAI-compatible)

Streaming

Architecture

Crate Overview

Configuration Reference

config.toml

CLI Arguments

Environment Variables

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages