Skip to content

raja-jamwal/Alloy

Repository files navigation

Alloy - AI Gateway & Orchestration Layer

A high-performance, unified AI Gateway built in Rust. Alloy normalizes requests across multiple LLM providers (OpenAI, Anthropic, AWS Bedrock), manages traffic governance (concurrency, retries, load balancing), and integrates with the Model Context Protocol (MCP) for tooling.

Inspired by LiteLLM and Bifrost — but written in Rust and optimized for heavy performance workloads.

Features

  • Multi-Provider Support: Route requests to OpenAI, Anthropic, or AWS Bedrock
  • OpenAI-Compatible API: Drop-in replacement for OpenAI API clients
  • Streaming Support: Full SSE streaming for chat completions
  • Traffic Governance: Concurrency limits, retry with exponential backoff, failover
  • MCP Integration: Tool orchestration via Model Context Protocol
  • Observability: Structured logging with tracing

Performance

Alloy is designed for minimal overhead. Benchmarks show negligible impact on request latency.

Latency Overhead

Metric Value
Min 15.92 µs
Median 23.63 µs
P99 56.96 µs
Throughput 39,540 req/s

For a typical LLM request (~800ms), the gateway adds 0.003% overhead.

Memory Usage

Metric Value
Binary size 22.22 MB
Baseline memory (idle) ~10 MB
Under load (1000 reqs) ~11 MB

See the benchmarks/ folder for detailed benchmarking scripts and results.

Quick Start

Prerequisites

  • Rust 2021 edition
  • API keys for your desired providers

Installation

cargo build --release

Configuration

Alloy can be configured via:

  1. Configuration file (config.toml)
  2. Environment variables (e.g., OPENAI_API_KEY)

Example environment setup:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export AWS_REGION="us-east-1"

Running the Server

# Using environment variables
cargo run --bin alloy

# With custom port and config
cargo run --bin alloy -- --port 8080 --config custom-config.toml

# View all options
cargo run --bin alloy -- --help

API Endpoints

Health Check

curl http://localhost:3000/health

Readiness Check

curl http://localhost:3000/ready

List Models

curl http://localhost:3000/v1/models

Chat Completions (OpenAI-compatible)

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Streaming

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Architecture

alloy/
├── alloy-core/         # Domain entities, schemas, and traits
├── alloy-providers/    # Adapter implementations for LLM APIs
├── alloy-governance/   # Traffic control, load balancing, resilience
├── alloy-mcp/          # Model Context Protocol implementation
└── alloy-server/       # HTTP server entry point and routing

Crate Overview

Crate Description
alloy-core Unified schemas (ChatRequest, ChatResponse), error types, and LLMProvider trait
alloy-providers OpenAI, Anthropic, and Bedrock provider implementations
alloy-governance Concurrency limiting, retry policies, and failover logic
alloy-mcp MCP tool definitions, Stdio/SSE transports, orchestration
alloy-server Axum HTTP server, routing, handlers, configuration

Configuration Reference

config.toml

[server]
timeout_secs = 120
max_body_size = 10485760  # 10MB

[providers.openai]
api_key = "sk-..."
timeout_secs = 60
max_concurrent = 10

[providers.anthropic]
api_key = "sk-ant-..."
timeout_secs = 60
max_concurrent = 10

[providers.bedrock]
region = "us-east-1"
profile = "default"
max_concurrent = 10

[governance]
max_retries = 3
failover_enabled = true
fallback_order = ["anthropic", "bedrock"]

CLI Arguments

Flag Default Description
-p, --port 3000 Server port
-c, --config config.toml Config file path
--log-level info Logging level
--json-logs false JSON log format
--metrics true Enable metrics

Environment Variables

Variable Description
OPENAI_API_KEY OpenAI API key
ANTHROPIC_API_KEY Anthropic API key
AWS_REGION AWS region for Bedrock
AWS_PROFILE AWS profile name
ALLOY_PORT Server port
ALLOY_CONFIG Config file path
ALLOY_LOG_LEVEL Log level

License

MIT

About

A high-performance AI Gateway and Orchestration Layer written in Rust

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors