Installation Guide

🚀 Installation Guide

Complete installation instructions for every platform and use case.

🎯 Choose Your Installation Method

Method	Best For	Time	Skill Level
#🐳-docker-recommended	Quick start, production	5 min	Beginner
#📦-pre-built-binaries	Simple deployment	10 min	Beginner
#🛠️-build-from-source	Custom builds, development	30 min	Intermediate
#☸️-kubernetes	Enterprise deployment	60 min	Advanced

🐳 Docker (Recommended)

✅ Fastest setup | ✅ Consistent environment | ✅ Easy updates

Prerequisites

Docker 20.10+ installed (Get Docker)
8GB+ RAM available for containers

Quick Start

# Run Inferno with demo model
docker run -d \
  --name inferno \
  -p 8080:8080 \
  -v inferno_models:/data/models \
  inferno:latest serve

# Test the installation
curl http://localhost:8080/health

Production Setup

# Create persistent volumes
docker volume create inferno_models
docker volume create inferno_cache
docker volume create inferno_config

# Run with persistent storage and configuration
docker run -d \
  --name inferno \
  --restart unless-stopped \
  -p 8080:8080 \
  -v inferno_models:/data/models \
  -v inferno_cache:/data/cache \
  -v inferno_config:/etc/inferno \
  -v $(pwd)/inferno.toml:/etc/inferno/inferno.toml:ro \
  inferno:latest serve --config /etc/inferno/inferno.toml

GPU Support

NVIDIA GPU

# Install nvidia-docker (Ubuntu/Debian)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update && sudo apt install -y nvidia-docker2
sudo systemctl restart docker

# Run with GPU support
docker run -d \
  --name inferno \
  --gpus all \
  -p 8080:8080 \
  -v inferno_models:/data/models \
  inferno:latest serve --gpu

AMD GPU (ROCm)

# Run with ROCm support
docker run -d \
  --name inferno \
  --device=/dev/kfd \
  --device=/dev/dri \
  --group-add video \
  -p 8080:8080 \
  -v inferno_models:/data/models \
  inferno:rocm serve --gpu

Docker Compose

Create docker-compose.yml:

version: '3.8'

services:
  inferno:
    image: inferno:latest
    container_name: inferno
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - inferno_models:/data/models
      - inferno_cache:/data/cache
      - ./config:/etc/inferno:ro
    environment:
      - RUST_LOG=info
      - INFERNO_CONFIG=/etc/inferno/inferno.toml
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  # Optional: Monitoring stack
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana_data:/var/lib/grafana

volumes:
  inferno_models:
  inferno_cache:
  grafana_data:

Start with: docker-compose up -d

📦 Pre-built Binaries

✅ No compilation needed | ✅ Platform optimized | ⚠️ Manual updates

Download Latest Release

# Linux x86_64
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-x86_64.tar.gz
tar xzf inferno-linux-x86_64.tar.gz
sudo mv inferno /usr/local/bin/

# macOS (Apple Silicon)
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-macos-aarch64.tar.gz
tar xzf inferno-macos-aarch64.tar.gz
sudo mv inferno /usr/local/bin/

# macOS (Intel)
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-macos-x86_64.tar.gz
tar xzf inferno-macos-x86_64.tar.gz
sudo mv inferno /usr/local/bin/

# Windows
# Download inferno-windows-x86_64.zip from GitHub releases
# Extract and add to PATH

Verify Installation

# Check version
inferno --version

# Verify capabilities
inferno system info

# Test basic functionality
inferno models list

Create Service (Linux)

# Create systemd service
sudo tee /etc/systemd/system/inferno.service > /dev/null <<EOF
[Unit]
Description=Inferno AI Inference Server
After=network.target

[Service]
Type=simple
User=inferno
Group=inferno
WorkingDirectory=/opt/inferno
ExecStart=/usr/local/bin/inferno serve --config /etc/inferno/inferno.toml
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
EOF

# Create user and directories
sudo useradd --system --shell /bin/false inferno
sudo mkdir -p /opt/inferno /etc/inferno /var/lib/inferno/models
sudo chown -R inferno:inferno /opt/inferno /var/lib/inferno

# Enable and start service
sudo systemctl enable inferno
sudo systemctl start inferno
sudo systemctl status inferno

🛠️ Build from Source

✅ Latest features | ✅ Custom optimizations | ⚠️ Requires development tools

Prerequisites

All Platforms

Rust: 1.70+ (Install Rust)
Git: For cloning repository
CMake: 3.15+ (for llama.cpp integration)

Platform-Specific

Linux:

# Ubuntu/Debian
sudo apt update
sudo apt install build-essential cmake pkg-config libssl-dev git

# CentOS/RHEL/Fedora
sudo dnf install gcc gcc-c++ cmake openssl-devel pkgconfig git

# Arch Linux
sudo pacman -S base-devel cmake openssl pkg-config git

macOS:

# Install Xcode Command Line Tools
xcode-select --install

# Install Homebrew dependencies
brew install cmake pkg-config openssl git

Windows:

# Install Visual Studio Build Tools 2019/2022
# Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/

# Install Git
# Download from: https://git-scm.com/download/win

# Install CMake
# Download from: https://cmake.org/download/

Build Process

# Clone repository
git clone https://github.com/ringo380/inferno.git
cd inferno

# Build with all features
cargo build --release --all-features

# Or build with specific features
cargo build --release --features "gguf,onnx,gpu-metal"

# Install system-wide (Linux/macOS)
sudo cp target/release/inferno /usr/local/bin/

# Verify installation
inferno --version

Feature Flags

# Available features
--features "gguf"           # GGUF model support
--features "onnx"           # ONNX model support
--features "gpu-metal"      # Apple Metal GPU support
--features "gpu-vulkan"     # Vulkan GPU support
--features "gpu-directml"   # DirectML GPU support (Windows)
--features "download"       # Model download capabilities

# Common combinations
cargo build --release --features "gguf,onnx"                    # Basic
cargo build --release --features "gguf,onnx,gpu-metal"          # macOS
cargo build --release --features "gguf,onnx,gpu-vulkan"         # Linux
cargo build --release --features "gguf,onnx,gpu-directml"       # Windows
cargo build --release --all-features                            # Everything

Custom Optimizations

# CPU-specific optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release

# Link-time optimization (slower build, faster runtime)
RUSTFLAGS="-C lto=fat" cargo build --release

# Combine optimizations
RUSTFLAGS="-C target-cpu=native -C lto=fat" cargo build --release --all-features

Development Build

# Fast development builds
cargo build

# With file watching (install cargo-watch first)
cargo install cargo-watch
cargo watch -x "build"

# Run tests
cargo test

# Run with development logging
RUST_LOG=debug cargo run -- serve

☸️ Kubernetes

✅ Enterprise scale | ✅ High availability | ✅ Auto-scaling

Prerequisites

Kubernetes 1.20+
kubectl configured
Helm 3.0+ (optional but recommended)

Helm Chart Installation

# Add Inferno Helm repository
helm repo add inferno https://charts.inferno.ai
helm repo update

# Install with default values
helm install inferno inferno/inferno

# Install with custom values
helm install inferno inferno/inferno -f values.yaml

# Upgrade
helm upgrade inferno inferno/inferno

Manual Kubernetes Deployment

Create inferno-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inferno
  labels:
    app: inferno
spec:
  replicas: 3
  selector:
    matchLabels:
      app: inferno
  template:
    metadata:
      labels:
        app: inferno
    spec:
      containers:
      - name: inferno
        image: inferno:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "4Gi"
            cpu: "1000m"
          limits:
            memory: "8Gi"
            cpu: "2000m"
            nvidia.com/gpu: 1
        env:
        - name: RUST_LOG
          value: "info"
        volumeMounts:
        - name: models
          mountPath: /data/models
        - name: config
          mountPath: /etc/inferno
      volumes:
      - name: models
        persistentVolumeClaim:
          claimName: inferno-models-pvc
      - name: config
        configMap:
          name: inferno-config
---
apiVersion: v1
kind: Service
metadata:
  name: inferno-service
spec:
  selector:
    app: inferno
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: inferno-config
data:
  inferno.toml: |
    models_dir = "/data/models"
    log_level = "info"

    [server]
    bind_address = "0.0.0.0"
    port = 8080

    [backend_config]
    gpu_enabled = true
    context_size = 4096
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: inferno-models-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

Deploy:

kubectl apply -f inferno-deployment.yaml

GPU Node Support

# Add to deployment spec for GPU nodes
nodeSelector:
  accelerator: nvidia-tesla-k80

# Or use node affinity
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: accelerator
          operator: In
          values:
          - nvidia-tesla-k80
          - nvidia-tesla-v100

🔧 Configuration

Basic Configuration

Create inferno.toml:

# Basic settings
models_dir = "/data/models"
cache_dir = "/data/cache"
log_level = "info"

[server]
bind_address = "0.0.0.0"
port = 8080
max_concurrent_requests = 100

[backend_config]
gpu_enabled = true
context_size = 4096
batch_size = 64
memory_map = true

[cache]
enabled = true
compression = "zstd"
max_size_gb = 10

Environment Variables

# Override configuration via environment
export INFERNO_LOG_LEVEL=debug
export INFERNO_MODELS_DIR="/custom/models"
export INFERNO_SERVER__PORT=9090
export INFERNO_BACKEND_CONFIG__GPU_ENABLED=true

Model Download

# Download popular models
inferno models download llama-2-7b-chat
inferno models download mistral-7b-instruct
inferno models download codellama-13b

# List available models
inferno models available

# Custom model download
inferno models download --url https://example.com/model.gguf --name custom-model

✅ Verify Installation

Health Check

# Basic health check
curl http://localhost:8080/health

# Detailed system info
inferno system info

# List available models
inferno models list

# Test inference
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Performance Test

# Benchmark your installation
inferno benchmark --model your-model --duration 60s

# Memory usage test
inferno test memory --model your-model

# GPU utilization test (if GPU available)
inferno test gpu --model your-model

🚨 Troubleshooting

Common Issues

Permission denied:

# Fix binary permissions
chmod +x /usr/local/bin/inferno

# Fix data directory permissions
sudo chown -R $USER:$USER /data/models

Port already in use:

# Check what's using port 8080
sudo lsof -i :8080

# Use different port
inferno serve --bind 0.0.0.0:8081

GPU not detected:

# Check GPU status
nvidia-smi  # NVIDIA
rocm-smi    # AMD

# Verify drivers
inferno system info --gpu

Get Help

Documentation: Troubleshooting
Community: GitHub Discussions
Issues: GitHub Issues

🎉 Next Steps

Download Models: Model Management
Configure Settings: Configuration Guide
Try Examples: Usage Examples
Monitor Performance: Monitoring Setup
Production Deploy: Production Deployment

Installation guide updated for Inferno v1.0.0. Having issues? Check FAQ or visit GitHub Discussions!

Installation Guide

🚀 Installation Guide

🎯 Choose Your Installation Method

🐳 Docker (Recommended)

Prerequisites

Quick Start

Production Setup

GPU Support

NVIDIA GPU

AMD GPU (ROCm)

Docker Compose

📦 Pre-built Binaries

Download Latest Release

Verify Installation

Create Service (Linux)

🛠️ Build from Source

Prerequisites

All Platforms

Platform-Specific

Build Process

Feature Flags

Custom Optimizations

Development Build

☸️ Kubernetes

Prerequisites

Helm Chart Installation

Manual Kubernetes Deployment

GPU Node Support

🔧 Configuration

Basic Configuration

Environment Variables

Model Download

✅ Verify Installation

Health Check

Performance Test

🚨 Troubleshooting

Common Issues

Get Help

🎉 Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

📚 Inferno Wiki

🚀 Getting Started

📖 User Guides

🔧 Advanced Topics

💻 API & Integration

🛠️ Development

❓ Help & Support

📊 Reference

Clone this wiki locally