-
Notifications
You must be signed in to change notification settings - Fork 0
Installation Guide
Ryan Robson edited this page Sep 16, 2025
·
2 revisions
Complete installation instructions for every platform and use case.
| Method | Best For | Time | Skill Level |
|---|---|---|---|
| #π³-docker-recommended | Quick start, production | 5 min | Beginner |
| #π¦-pre-built-binaries | Simple deployment | 10 min | Beginner |
| #π οΈ-build-from-source | Custom builds, development | 30 min | Intermediate |
| #βΈοΈ-kubernetes | Enterprise deployment | 60 min | Advanced |
β Fastest setup | β Consistent environment | β Easy updates
- Docker 20.10+ installed (Get Docker)
- 8GB+ RAM available for containers
# Run Inferno with demo model
docker run -d \
--name inferno \
-p 8080:8080 \
-v inferno_models:/data/models \
inferno:latest serve
# Test the installation
curl http://localhost:8080/health# Create persistent volumes
docker volume create inferno_models
docker volume create inferno_cache
docker volume create inferno_config
# Run with persistent storage and configuration
docker run -d \
--name inferno \
--restart unless-stopped \
-p 8080:8080 \
-v inferno_models:/data/models \
-v inferno_cache:/data/cache \
-v inferno_config:/etc/inferno \
-v $(pwd)/inferno.toml:/etc/inferno/inferno.toml:ro \
inferno:latest serve --config /etc/inferno/inferno.toml# Install nvidia-docker (Ubuntu/Debian)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update && sudo apt install -y nvidia-docker2
sudo systemctl restart docker
# Run with GPU support
docker run -d \
--name inferno \
--gpus all \
-p 8080:8080 \
-v inferno_models:/data/models \
inferno:latest serve --gpu# Run with ROCm support
docker run -d \
--name inferno \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
-p 8080:8080 \
-v inferno_models:/data/models \
inferno:rocm serve --gpuCreate docker-compose.yml:
version: '3.8'
services:
inferno:
image: inferno:latest
container_name: inferno
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- inferno_models:/data/models
- inferno_cache:/data/cache
- ./config:/etc/inferno:ro
environment:
- RUST_LOG=info
- INFERNO_CONFIG=/etc/inferno/inferno.toml
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
# Optional: Monitoring stack
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana_data:/var/lib/grafana
volumes:
inferno_models:
inferno_cache:
grafana_data:Start with: docker-compose up -d
β
No compilation needed | β
Platform optimized |
# Linux x86_64
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-x86_64.tar.gz
tar xzf inferno-linux-x86_64.tar.gz
sudo mv inferno /usr/local/bin/
# macOS (Apple Silicon)
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-macos-aarch64.tar.gz
tar xzf inferno-macos-aarch64.tar.gz
sudo mv inferno /usr/local/bin/
# macOS (Intel)
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-macos-x86_64.tar.gz
tar xzf inferno-macos-x86_64.tar.gz
sudo mv inferno /usr/local/bin/
# Windows
# Download inferno-windows-x86_64.zip from GitHub releases
# Extract and add to PATH# Check version
inferno --version
# Verify capabilities
inferno system info
# Test basic functionality
inferno models list# Create systemd service
sudo tee /etc/systemd/system/inferno.service > /dev/null <<EOF
[Unit]
Description=Inferno AI Inference Server
After=network.target
[Service]
Type=simple
User=inferno
Group=inferno
WorkingDirectory=/opt/inferno
ExecStart=/usr/local/bin/inferno serve --config /etc/inferno/inferno.toml
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
EOF
# Create user and directories
sudo useradd --system --shell /bin/false inferno
sudo mkdir -p /opt/inferno /etc/inferno /var/lib/inferno/models
sudo chown -R inferno:inferno /opt/inferno /var/lib/inferno
# Enable and start service
sudo systemctl enable inferno
sudo systemctl start inferno
sudo systemctl status infernoβ
Latest features | β
Custom optimizations |
- Rust: 1.70+ (Install Rust)
- Git: For cloning repository
- CMake: 3.15+ (for llama.cpp integration)
Linux:
# Ubuntu/Debian
sudo apt update
sudo apt install build-essential cmake pkg-config libssl-dev git
# CentOS/RHEL/Fedora
sudo dnf install gcc gcc-c++ cmake openssl-devel pkgconfig git
# Arch Linux
sudo pacman -S base-devel cmake openssl pkg-config gitmacOS:
# Install Xcode Command Line Tools
xcode-select --install
# Install Homebrew dependencies
brew install cmake pkg-config openssl gitWindows:
# Install Visual Studio Build Tools 2019/2022
# Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/
# Install Git
# Download from: https://git-scm.com/download/win
# Install CMake
# Download from: https://cmake.org/download/# Clone repository
git clone https://github.com/ringo380/inferno.git
cd inferno
# Build with all features
cargo build --release --all-features
# Or build with specific features
cargo build --release --features "gguf,onnx,gpu-metal"
# Install system-wide (Linux/macOS)
sudo cp target/release/inferno /usr/local/bin/
# Verify installation
inferno --version# Available features
--features "gguf" # GGUF model support
--features "onnx" # ONNX model support
--features "gpu-metal" # Apple Metal GPU support
--features "gpu-vulkan" # Vulkan GPU support
--features "gpu-directml" # DirectML GPU support (Windows)
--features "download" # Model download capabilities
# Common combinations
cargo build --release --features "gguf,onnx" # Basic
cargo build --release --features "gguf,onnx,gpu-metal" # macOS
cargo build --release --features "gguf,onnx,gpu-vulkan" # Linux
cargo build --release --features "gguf,onnx,gpu-directml" # Windows
cargo build --release --all-features # Everything# CPU-specific optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release
# Link-time optimization (slower build, faster runtime)
RUSTFLAGS="-C lto=fat" cargo build --release
# Combine optimizations
RUSTFLAGS="-C target-cpu=native -C lto=fat" cargo build --release --all-features# Fast development builds
cargo build
# With file watching (install cargo-watch first)
cargo install cargo-watch
cargo watch -x "build"
# Run tests
cargo test
# Run with development logging
RUST_LOG=debug cargo run -- serveβ Enterprise scale | β High availability | β Auto-scaling
- Kubernetes 1.20+
- kubectl configured
- Helm 3.0+ (optional but recommended)
# Add Inferno Helm repository
helm repo add inferno https://charts.inferno.ai
helm repo update
# Install with default values
helm install inferno inferno/inferno
# Install with custom values
helm install inferno inferno/inferno -f values.yaml
# Upgrade
helm upgrade inferno inferno/infernoCreate inferno-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: inferno
labels:
app: inferno
spec:
replicas: 3
selector:
matchLabels:
app: inferno
template:
metadata:
labels:
app: inferno
spec:
containers:
- name: inferno
image: inferno:latest
ports:
- containerPort: 8080
resources:
requests:
memory: "4Gi"
cpu: "1000m"
limits:
memory: "8Gi"
cpu: "2000m"
nvidia.com/gpu: 1
env:
- name: RUST_LOG
value: "info"
volumeMounts:
- name: models
mountPath: /data/models
- name: config
mountPath: /etc/inferno
volumes:
- name: models
persistentVolumeClaim:
claimName: inferno-models-pvc
- name: config
configMap:
name: inferno-config
---
apiVersion: v1
kind: Service
metadata:
name: inferno-service
spec:
selector:
app: inferno
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: LoadBalancer
---
apiVersion: v1
kind: ConfigMap
metadata:
name: inferno-config
data:
inferno.toml: |
models_dir = "/data/models"
log_level = "info"
[server]
bind_address = "0.0.0.0"
port = 8080
[backend_config]
gpu_enabled = true
context_size = 4096
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: inferno-models-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100GiDeploy:
kubectl apply -f inferno-deployment.yaml# Add to deployment spec for GPU nodes
nodeSelector:
accelerator: nvidia-tesla-k80
# Or use node affinity
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: accelerator
operator: In
values:
- nvidia-tesla-k80
- nvidia-tesla-v100Create inferno.toml:
# Basic settings
models_dir = "/data/models"
cache_dir = "/data/cache"
log_level = "info"
[server]
bind_address = "0.0.0.0"
port = 8080
max_concurrent_requests = 100
[backend_config]
gpu_enabled = true
context_size = 4096
batch_size = 64
memory_map = true
[cache]
enabled = true
compression = "zstd"
max_size_gb = 10# Override configuration via environment
export INFERNO_LOG_LEVEL=debug
export INFERNO_MODELS_DIR="/custom/models"
export INFERNO_SERVER__PORT=9090
export INFERNO_BACKEND_CONFIG__GPU_ENABLED=true# Download popular models
inferno models download llama-2-7b-chat
inferno models download mistral-7b-instruct
inferno models download codellama-13b
# List available models
inferno models available
# Custom model download
inferno models download --url https://example.com/model.gguf --name custom-model# Basic health check
curl http://localhost:8080/health
# Detailed system info
inferno system info
# List available models
inferno models list
# Test inference
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "your-model",
"messages": [{"role": "user", "content": "Hello!"}]
}'# Benchmark your installation
inferno benchmark --model your-model --duration 60s
# Memory usage test
inferno test memory --model your-model
# GPU utilization test (if GPU available)
inferno test gpu --model your-modelPermission denied:
# Fix binary permissions
chmod +x /usr/local/bin/inferno
# Fix data directory permissions
sudo chown -R $USER:$USER /data/modelsPort already in use:
# Check what's using port 8080
sudo lsof -i :8080
# Use different port
inferno serve --bind 0.0.0.0:8081GPU not detected:
# Check GPU status
nvidia-smi # NVIDIA
rocm-smi # AMD
# Verify drivers
inferno system info --gpu- Documentation: Troubleshooting
- Community: GitHub Discussions
- Issues: GitHub Issues
- Download Models: Model Management
- Configure Settings: Configuration Guide
- Try Examples: Usage Examples
- Monitor Performance: Monitoring Setup
- Production Deploy: Production Deployment
Installation guide updated for Inferno v1.0.0. Having issues? Check FAQ or visit GitHub Discussions!