One API for ALL LLMs
Unify your AI providers with intelligent key management, automatic failover, semantic caching, and comprehensive analytics. Bring your own keys, we handle the complexity.
Unio is a unified AI gateway that simplifies working with multiple LLM providers through a single, OpenAI-compatible API. Instead of managing separate integrations for OpenAI, Anthropic, Google, Groq, and others, Unio provides one interface with intelligent key rotation, automatic failover, semantic caching, and detailed analytics.
Key Benefits:
- Bring Your Own Keys - Use your existing API keys across all providers
- Smart Semantic Caching - Reduce costs and latency with vector-based response caching
- Knowledge Vaults (RAG) - Upload documents and augment responses with your own data
- Automatic Key Rotation - Intelligent load balancing and failover
- Advanced Analytics - Track usage, costs, and performance across providers with cache insights
- OpenAI SDK Compatible - Drop-in replacement for existing OpenAI integrations
- Smart Fallback System - Automatic provider switching on failures
- Key Verification - Validate credentials instantly upon addition
- Dynamic Model Catalog - Real-time discovery of available models from your providers
- axiom - an opensource ai search engine like perplexity ai
- Sign up at unio.chipling.xyz
- Add your API keys for your preferred providers
- Get your Unio API token
- Start making requests!
curl -X POST https://api.unio.chipling.xyz/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_UNIO_TOKEN" \
-d '{
"model": "openai:gpt-4o",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'- Python 3.8+ (for local development) OR Docker (for containerized deployment)
- Node.js 16+ (for frontend)
- Supabase account (for database)
Option 1: Using Docker (Recommended)
# Clone the repository
git clone https://github.com/maoucodes/Unio.git
cd unio
# Configure environment
cd app
cp .env.example .env
# Edit .env with your Supabase credentials
# Build and run with Docker
cd ..
docker-compose up -d
# Or build and run manually
docker build -t unio-backend .
docker run -d -p 8000:8000 --env-file app/.env unio-backendSee docs/DOCKER.md for detailed Docker deployment instructions.
Option 2: Local Development
# Clone the repository
git clone https://github.com/maoucodes/Unio.git
cd unio
# Set up backend
cd app
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your Supabase credentials
# Run the server
uvicorn app:app --reload# Set up frontend
cd frontend
npm install
# Configure environment
cp .env.example .env.local
# Edit .env.local with your configuration
# Run the development server
npm run devVisit http://localhost:5173 to access the dashboard.
| Provider | Models | Streaming | Function Calling |
|---|---|---|---|
| OpenAI | GPT-4, GPT-4o, GPT-3.5, etc. | Yes | Yes |
| Anthropic | Claude 3.5, Claude 3, etc. | Yes | Yes |
| Gemini Pro, Gemini Flash | Yes | Yes | |
| Groq | Llama 3, Mixtral, Gemma, etc. | Yes | No |
| Together | Llama 3, Qwen, etc. | Yes | No |
| OpenRouter | 100+ models | Yes | Yes |
from openai import OpenAI
# Use Unio as a drop-in replacement
client = OpenAI(
api_key="your-unio-token",
base_url="https://api.unio.chipling.xyz/v1"
)
response = client.chat.completions.create(
model="anthropic:claude-3-5-sonnet-20241022",
messages=[
{"role": "user", "content": "Hello from Unio!"}
]
)
print(response.choices[0].message.content)import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-unio-token',
baseURL: 'https://api.unio.chipling.xyz/v1'
});
const response = await client.chat.completions.create({
model: 'openai:gpt-4o',
messages: [
{ role: 'user', content: 'Hello from Unio!' }
]
});
console.log(response.choices[0].message.content);stream = client.chat.completions.create(
model="groq:llama-3.1-70b-versatile",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Enable caching to reduce costs and latency for similar queries:
response = client.chat.completions.create(
model="openai:gpt-4o",
messages=[{"role": "user", "content": "What is machine learning?"}],
extra_body={
"cache_enabled": True,
"cache_threshold": 0.95 # Similarity threshold (0-1)
}
)The cache uses vector similarity to match semantically similar prompts, even if worded differently.
Augment responses with your own documents:
# First, create a vault and upload documents via the dashboard
# Then reference it in your requests:
response = client.chat.completions.create(
model="openai:gpt-4o",
messages=[{"role": "user", "content": "What does our policy say about refunds?"}],
extra_body={
"vault_id": "your-vault-uuid"
}
)Use the X-Fallback-Model header for automatic failover:
response = client.chat.completions.create(
model="openai:gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
extra_headers={
"X-Fallback-Model": "anthropic:claude-3-5-sonnet-20241022"
}
)- Vector-based similarity matching using pgvector
- Intent understanding - Matches semantically similar prompts, not just exact duplicates
- Customizable thresholds - Control matching sensitivity per request
- Streaming support - Cache hits stream back transparently
- Dashboard visibility - Cache hit/miss badges and performance metrics in logs
- Document ingestion - Upload PDFs, text files, and more
- Vector embeddings - Automatic chunking and embedding
- Context augmentation - Retrieved chunks enhance LLM responses
- Vault management - Organize documents by project or use case
- RAG analytics - Track retrieved chunks and context usage
- Secure storage with AES-256 encryption
- Per-provider key organization
- Usage tracking and analytics
- Key health monitoring
- Instant verification on addition
- Automatic key rotation within providers
- Smart fallback between providers
- Load balancing for high availability
- Rate limit handling
- Real-time usage metrics
- Cost tracking across providers
- Performance analytics (TTFT, tokens/sec)
- Request/response logging with cache insights
- Error rate monitoring
- RAG metadata (retrieved chunks, context preview)
- OpenAI SDK compatibility
- Comprehensive error handling
- Structured logging
- Token usage tracking
- Request/response validation
Unio follows a modern microservices architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ React Frontend │ │ FastAPI Backend│ │ Supabase DB │
│ │────│ │────│ │
│ • Dashboard │ │ • API Gateway │ │ • Users │
│ • Analytics │ │ • Auth │ │ • API Keys │
│ • Key Mgmt │ │ • Providers │ │ • Logs │
│ • Vaults │ │ • Cache │ │ • Vaults │
│ • Docs │ │ • RAG Engine │ │ • Cache │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ AI Providers │
│ │
│ • OpenAI │
│ • Anthropic │
│ • Google │
│ • Groq │
│ • Together │
│ • OpenRouter │
└─────────────────┘
Backend:
- FastAPI (Python)
- Supabase (PostgreSQL + pgvector)
- Pydantic for validation
- Uvicorn ASGI server
Frontend:
- React 18 with TypeScript
- Vite build tool
- Tailwind CSS + shadcn/ui
- React Query for state management
- React Router for navigation
Infrastructure:
- Supabase for authentication & database
- Server-Sent Events for streaming
- RESTful API design
- Vector search (pgvector) for semantic caching
We welcome contributions! Please see our Contributing Guide for details.
-
Fork the repository
-
Clone your fork
git clone https://github.com/your-username/unio.git cd unio -
Set up the development environment
# Backend cd app pip install -r requirements.txt cp .env.example .env # Edit .env with your Supabase credentials # Frontend cd ../frontend npm install cp .env.example .env.local # Edit .env.local with your configuration
-
Create a feature branch
git checkout -b feature/amazing-feature
-
Make your changes and test
-
Submit a pull request
# Backend tests
cd app
python -m pytest tests/
# Frontend tests (when available)
cd frontend
npm testThis project is licensed under the MIT License - see the LICENSE file for details.
- Email: meet.sonawane2015@gmail.com
- GitHub Issues: Report bugs or request features
- Documentation: Full API Documentation
- Semantic Caching - Intelligent vector-based response caching ✅ (v1.3.0)
- Knowledge Vaults (RAG) - Document-based context augmentation ✅ (v1.1.0)
- Custom Models - Support for self-hosted models
- Rate Limiting - Per-user and per-key limits
- Webhook Support - Real-time notifications
- Team Management - Collaborative key management
- Cost Optimization - Automatic cost-based routing
- More Providers - Additional LLM providers
Made with love by the Unio team
Star us on GitHub if you find Unio useful!