Skip to content

System Requirements

Ryan Robson edited this page Sep 16, 2025 · 2 revisions

πŸ’» System Requirements

This guide helps you determine if your system can run Inferno and what performance to expect.

🎯 Quick Check

Minimum for basic usage:

  • RAM: 8GB (16GB recommended)
  • Storage: 20GB free space (plus model storage)
  • CPU: Any modern x64 processor
  • OS: Linux, macOS, or Windows

For optimal performance:

  • RAM: 32GB+ for large models
  • GPU: NVIDIA RTX 3060 or better, Apple M1/M2, AMD RX 6600+
  • Storage: SSD recommended, 100GB+ for multiple models
  • Network: For model downloads (optional after setup)

πŸ–₯️ Platform-Specific Requirements

Linux (Recommended)

βœ… Best Performance | βœ… Full GPU Support | βœ… Docker Native

Supported Distributions

  • Ubuntu: 20.04 LTS, 22.04 LTS, 24.04 LTS
  • Debian: 11 (Bullseye), 12 (Bookworm)
  • CentOS/RHEL: 8, 9
  • Fedora: 37, 38, 39
  • Arch Linux: Rolling release
  • openSUSE: Leap 15.4+, Tumbleweed

System Dependencies

# Ubuntu/Debian
sudo apt update
sudo apt install build-essential curl pkg-config libssl-dev

# CentOS/RHEL/Fedora
sudo dnf install gcc gcc-c++ curl openssl-devel pkgconfig

# Arch Linux
sudo pacman -S base-devel curl openssl pkg-config

GPU Support

NVIDIA GPUs (Recommended):

  • Driver: 470+ (525+ for RTX 4000 series)
  • CUDA: 11.8+ (auto-installed with ONNX Runtime)
  • Compute Capability: 6.0+ (GTX 1060 and newer)
  • VRAM: 6GB minimum, 12GB+ recommended
# Install NVIDIA drivers (Ubuntu)
sudo ubuntu-drivers autoinstall
# Or manually:
sudo apt install nvidia-driver-525

# Verify installation
nvidia-smi

AMD GPUs:

  • ROCm: 5.0+ for RX 6000 series and newer
  • VRAM: 8GB minimum
# Install ROCm (Ubuntu 22.04)
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_*_all.deb
sudo dpkg -i amdgpu-install_*_all.deb
sudo amdgpu-install --usecase=rocm

Intel GPUs:

  • Arc A-Series: A380, A750, A770
  • Integrated: Iris Xe, UHD Graphics (limited performance)

macOS

βœ… Great for Development | βœ… Apple Silicon Support | ⚠️ Limited GPU Options

Supported Versions

  • macOS: 12.0 (Monterey) or later
  • Recommended: macOS 13.0+ for best Metal performance

Hardware Requirements

Apple Silicon (M1/M2/M3):

  • Unified Memory: 16GB minimum, 32GB+ for large models
  • Metal: Native support, excellent performance
  • Neural Engine: Automatic acceleration for compatible operations

Intel Macs:

  • RAM: 16GB minimum (no unified memory benefits)
  • GPU: Dedicated GPU recommended (AMD Radeon Pro)
  • Note: Performance significantly lower than Apple Silicon

Installation Dependencies

# Install Xcode Command Line Tools
xcode-select --install

# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install required tools
brew install cmake pkg-config openssl

Windows

βœ… Full DirectML Support | ⚠️ Requires WSL2 for Best Experience

Supported Versions

  • Windows: 10 (1903+), 11
  • WSL2: Ubuntu 22.04 LTS (recommended for Linux performance)

GPU Support

NVIDIA GPUs:

  • DirectML: Native Windows support
  • CUDA: Full support via WSL2
  • Driver: Game Ready 472+ or Studio 472+

AMD GPUs:

  • DirectML: Native Windows support
  • Driver: Adrenalin 22.10.1+

Intel GPUs:

  • DirectML: Arc A-Series support
  • Driver: Latest Intel Arc drivers

Windows Native Setup

# Install Visual Studio Build Tools
# Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/

# Install Rust
# Download from: https://rustup.rs/

# Install Git
# Download from: https://git-scm.com/download/win

WSL2 Setup (Recommended)

# Enable WSL2
wsl --install -d Ubuntu-22.04

# Inside WSL2, follow Linux Ubuntu instructions

πŸš€ Hardware Performance Guide

Model Size vs. Hardware Requirements

Model Size Minimum RAM Recommended RAM GPU VRAM Performance
7B parameters 8GB 16GB 6GB Good
13B parameters 16GB 32GB 12GB Very Good
30B parameters 32GB 64GB 24GB Excellent
70B parameters 64GB 128GB 48GB+ Outstanding

GPU Performance Tiers

Tier 1: Enthusiast πŸš€

  • NVIDIA: RTX 4090, RTX 4080, A6000
  • Apple: M2 Ultra, M3 Ultra (32GB+ unified memory)
  • Performance: Run 70B+ models smoothly

Tier 2: High Performance ⚑

  • NVIDIA: RTX 4070, RTX 3080, RTX 3090
  • Apple: M2 Pro, M3 Pro (16GB+ unified memory)
  • AMD: RX 7800 XT, RX 6800 XT
  • Performance: Handle 30B models well, 70B with careful management

Tier 3: Good Performance πŸ’ͺ

  • NVIDIA: RTX 4060, RTX 3070, RTX 3060 Ti
  • Apple: M1 Pro, M2 (16GB unified memory)
  • AMD: RX 6700 XT, RX 7600
  • Performance: Great for 13B models, acceptable for 30B

Tier 4: Entry Level πŸ“ˆ

  • NVIDIA: RTX 3060, GTX 1660 Ti
  • Apple: M1 (8GB unified memory)
  • AMD: RX 6600, RX 5700 XT
  • Intel: Arc A750, A770
  • Performance: Perfect for 7B models

πŸ’Ύ Storage Requirements

Model Storage Needs

  • 7B model: 4-8GB per model
  • 13B model: 8-16GB per model
  • 30B model: 20-40GB per model
  • 70B model: 40-80GB per model

Recommended Storage Setup

πŸ—‚οΈ Inferno Storage Layout
β”œβ”€β”€ πŸ“ /models/           (100-500GB)
β”‚   β”œβ”€β”€ llama-2-7b.gguf   (7GB)
β”‚   β”œβ”€β”€ mistral-7b.gguf   (7GB)
β”‚   └── codellama-13b.gguf (13GB)
β”œβ”€β”€ πŸ“ /cache/            (10-50GB)
β”‚   └── response_cache/
β”œβ”€β”€ πŸ“ /logs/             (1-10GB)
β”‚   └── audit_logs/
└── πŸ“ /config/           (<1GB)
    └── inferno.toml

Storage Performance Impact

  • HDD: 20-50 tokens/sec (acceptable)
  • SATA SSD: 50-100 tokens/sec (good)
  • NVMe SSD: 100-200+ tokens/sec (excellent)

🌐 Network Requirements

Initial Setup

  • Model Downloads: 5-80GB per model (one-time)
  • Bandwidth: 10+ Mbps recommended for faster downloads
  • Alternative: Transfer models via USB/external drive

Runtime Operation

  • Network: Not required (fully offline capable)
  • Updates: Optional, only for Inferno updates
  • Monitoring: Optional external metrics collection

πŸ”§ Compatibility Matrix

Model Format Support by Platform

Format Linux macOS Windows Performance
GGUF βœ… Full βœ… Full βœ… Full Excellent
ONNX βœ… Full βœ… Full βœ… Full Excellent
PyTorch βœ… Convert βœ… Convert βœ… Convert Via Conversion
SafeTensors βœ… Convert βœ… Convert βœ… Convert Via Conversion

GPU Acceleration Support

GPU Vendor Linux macOS Windows API
NVIDIA βœ… CUDA/Vulkan ❌ No βœ… DirectML/CUDA CUDA, DirectML
AMD βœ… ROCm/Vulkan ❌ No βœ… DirectML ROCm, DirectML
Intel βœ… Vulkan ❌ No βœ… DirectML Vulkan, DirectML
Apple N/A βœ… Metal N/A Metal

πŸ§ͺ Performance Testing

Benchmark Your System

# Download Inferno
inferno --version

# Test with small model
inferno run --model test-7b --prompt "Hello world" --benchmark

# Check GPU utilization
inferno system info --gpu

# Performance test
inferno benchmark --model your-model --duration 60s

Expected Performance Baselines

7B Model Performance (tokens/second)

  • RTX 4090: 150-200 tok/s
  • RTX 3080: 80-120 tok/s
  • M2 Pro: 60-80 tok/s
  • RTX 3060: 40-60 tok/s

13B Model Performance (tokens/second)

  • RTX 4090: 80-120 tok/s
  • RTX 3080: 40-60 tok/s
  • M2 Pro: 30-40 tok/s
  • RTX 3060: 15-25 tok/s

❓ Need Help?

Performance Issues?

Hardware Questions?

Want to Upgrade?


System requirements updated for Inferno v1.0.0. Check Changelog for latest updates.

Clone this wiki locally