Skip to content

atopos31/llmio

Repository files navigation

LLMIO

English | 中文

LLMIO is a Go-based LLM load‑balancing gateway that provides a unified REST API, weighted scheduling, logging, and a modern admin UI for LLM clients (openclaw / claude code / codex / gemini cli / cherry studio / open webui). It helps you integrate OpenAI, Anthropic, Gemini, and other model capabilities in a single service.

QQ group: 1083599685

Architecture

LLMIO Architecture

Features

  • Unified API: Compatible with OpenAI Chat Completions, OpenAI Responses, Gemini Native, and Anthropic Messages. Supports both streaming and non‑streaming passthrough.
  • Weighted scheduling: balancers/ provides two strategies (random by weight / priority by weight). You can route based on tool calling, structured output, and multimodal capability.
  • Admin Web UI: React + TypeScript + Tailwind + Vite console for providers, models, associations, logs, and metrics.
  • Rate limiting & failure handling: Built‑in rate‑limit fallback and provider connectivity checks for fault isolation.
  • Local persistence: Pure Go SQLite (db/llmio.db) for config and request logs, ready to use out of the box.

Deployment

Docker Compose (Recommended)

services:
  llmio:
    image: atopos31/llmio:latest
    ports:
      - 7070:7070
    volumes:
      - ./db:/app/db
    environment:
      - GIN_MODE=release
      - TOKEN=<YOUR_TOKEN>
      - TZ=Asia/Shanghai
docker compose up -d

Docker

docker run -d \
  --name llmio \
  -p 7070:7070 \
  -v $(pwd)/db:/app/db \
  -e GIN_MODE=release \
  -e TOKEN=<YOUR_TOKEN> \
  -e TZ=Asia/Shanghai \
  atopos31/llmio:latest

Local Run

Download the release package for your OS/arch from releases (version > 0.5.13). Example for linux amd64:

wget https://github.com/atopos31/llmio/releases/download/v0.5.13/llmio_0.5.13_linux_amd64.tar.gz

Extract:

tar -xzf ./llmio_0.5.13_linux_amd64.tar.gz

Start:

GIN_MODE=release TOKEN=<YOUR_TOKEN> ./llmio

The service will create ./db/llmio.db in the current directory as the SQLite persistence file.

Environment Variables

Variable Description Default Notes
TOKEN Console login and API auth for /openai /anthropic /gemini /v1 None Required for public access
GIN_MODE Gin runtime mode debug Use release in production
LLMIO_SERVER_PORT Server listen port 7070 Service listen port
TZ Timezone for logs and scheduling Host default Recommend explicit setting in containers (e.g. Asia/Shanghai)
DB_VACUUM Run SQLite VACUUM on startup Disabled Set to true to reclaim space

Development

Clone:

git clone https://github.com/atopos31/llmio.git
cd llmio

Build frontend (pnpm required):

make webui

Run backend (Go >= 1.26.1):

TOKEN=<YOUR_TOKEN> make run

Web UI: http://localhost:7070/

API Endpoints

LLMIO provides a multi‑provider REST API with the following endpoints:

Provider Path Method Description Auth
OpenAI /openai/v1/models GET List available models Bearer Token
OpenAI /openai/v1/chat/completions POST Create chat completion Bearer Token
OpenAI /openai/v1/responses POST Create response Bearer Token
Anthropic /anthropic/v1/models GET List available models x-api-key
Anthropic /anthropic/v1/messages POST Create message x-api-key
Anthropic /anthropic/v1/messages/count_tokens POST Count tokens x-api-key
Gemini /gemini/v1beta/models GET List available models x-goog-api-key
Gemini /gemini/v1beta/models/{model}:generateContent POST Generate content x-goog-api-key
Gemini /gemini/v1beta/models/{model}:streamGenerateContent POST Stream content x-goog-api-key
Generic /v1/models GET List models (compat) Bearer Token
Generic /v1/chat/completions POST Create chat completion (compat) Bearer Token
Generic /v1/responses POST Create response (compat) Bearer Token
Generic /v1/messages POST Create message (compat) x-api-key
Generic /v1/messages/count_tokens POST Count tokens (compat) x-api-key

Authentication

LLMIO uses different auth headers depending on the endpoint:

1. OpenAI‑style endpoints (Bearer Token)

Applies to /openai/v1/* and OpenAI‑compatible endpoints under /v1/*.

curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:7070/openai/v1/models

2. Anthropic‑style endpoints (x-api-key)

Applies to /anthropic/v1/* and Anthropic‑compatible endpoints under /v1/*.

curl -H "x-api-key: YOUR_TOKEN" http://localhost:7070/anthropic/v1/messages

3. Gemini Native endpoints (x-goog-api-key)

Applies to /gemini/v1beta/* endpoints.

curl -H "x-goog-api-key: YOUR_TOKEN" http://localhost:7070/gemini/v1beta/models

For claude code or codex, use these environment variables:

export OPENAI_API_KEY=<YOUR_TOKEN>
export ANTHROPIC_API_KEY=<YOUR_TOKEN>
export GEMINI_API_KEY=<YOUR_TOKEN>

Note: /v1/* paths are kept for compatibility. Prefer the provider‑specific routes.

Project Structure

.
├─ main.go              # HTTP server entry and routes
├─ handler/             # REST handlers
├─ service/             # Business logic and load‑balancing
├─ middleware/          # Auth, rate limit, streaming middleware
├─ providers/           # Provider adapters
├─ balancers/           # Weight and scheduling strategies
├─ models/              # GORM models and DB init
├─ common/              # Shared helpers
├─ webui/               # React + TypeScript admin UI
└─ docs/                # Ops & usage docs

Screenshots

Dashboard

Associations

Logs

License

This project is released under the MIT License.

Star History

Stargazers over time