Skip to content
Ryan Robson edited this page Sep 16, 2025 · 3 revisions

โ“ Frequently Asked Questions

Common questions and answers about Inferno. Can't find what you're looking for? Visit GitHub Discussions!

๐ŸŽฏ General Questions

What is Inferno?

Inferno is a production-ready AI inference server that runs entirely on your local hardware. Think of it as your private ChatGPT that works offline, supports multiple model formats, and gives you complete control over your AI infrastructure.

How is Inferno different from cloud AI services?

Aspect Cloud AI (OpenAI, etc.) Inferno
Privacy Data sent to external servers 100% local processing
Internet Required for every request Works completely offline
Models Limited to provider's models Use any model you want
Speed Network dependent Local hardware speed
Control Limited customization Full control over everything

Is Inferno really free?

Yes! Inferno is completely open source under MIT/Apache 2.0 licenses. You pay only for:

  • Your hardware (computer, GPU)
  • Electricity to run it
  • Models (many are free, some commercial models require licenses)

No subscription fees, no per-token costs, no vendor lock-in.

What can I do with Inferno?

  • Chat with AI models (like ChatGPT, but private)
  • Generate code (coding assistants, debugging)
  • Process documents (summarization, analysis, translation)
  • Create content (writing, brainstorming, creative tasks)
  • Automate workflows (batch processing, API integration)
  • Build AI-powered applications (embed in your software)

๐Ÿ› ๏ธ Technical Questions

What models does Inferno support?

Format Support:

  • โœ… GGUF - Optimized format for CPU/GPU inference (llama.cpp compatible)
  • โœ… ONNX - Cross-platform ML format (PyTorch, TensorFlow, scikit-learn exports)
  • โœ… PyTorch - Via conversion to GGUF/ONNX
  • โœ… SafeTensors - Via conversion to GGUF/ONNX

Popular Models:

  • Llama 2 (7B, 13B, 70B) - General purpose, code, chat
  • Mistral (7B, 8x7B) - High performance, multilingual
  • CodeLlama (7B, 13B, 34B) - Code generation and completion
  • Vicuna, Alpaca, WizardLM - Various specialized models
  • Custom models - Any GGUF or ONNX model

What hardware do I need?

Minimum (7B models):

  • 8GB RAM, any modern CPU
  • 20GB storage space
  • No GPU required (but helpful)

Recommended (13B models):

  • 16GB RAM, multi-core CPU
  • NVIDIA RTX 3060+ or Apple M1/M2
  • 50GB+ SSD storage

Optimal (30B+ models):

  • 32GB+ RAM
  • NVIDIA RTX 4080+ or Apple M2 Ultra
  • 100GB+ NVMe SSD

See System Requirements for detailed specifications.

Does Inferno require internet?

No! Inferno runs completely offline after initial setup:

Requires Internet:

  • Downloading Inferno itself
  • Downloading AI models (one-time)
  • Software updates (optional)

Works Offline:

  • All AI inference and chat
  • Model conversion and management
  • API services and batch processing
  • Everything after initial setup

Perfect for air-gapped environments, remote locations, or privacy-critical applications.

How do I get better performance?

Hardware Upgrades:

  • GPU: Single biggest performance boost (10-50x faster)
  • SSD: Faster model loading and caching
  • RAM: Handle larger models and context

Software Optimizations:

  • Model Choice: Smaller models (7B vs 70B) run much faster
  • Quantization: Q4_0 models use less memory, run faster
  • Context Size: Reduce context_size in config
  • Batch Size: Tune batch_size for your hardware

See Performance Tuning for detailed optimization guide.


๐Ÿ”ง Setup & Installation

Which installation method should I choose?

Docker (Recommended):

  • โœ… Fastest setup (5 minutes)
  • โœ… Consistent environment
  • โœ… Easy updates and backups
  • โœ… Works on any platform

Pre-built Binaries:

  • โœ… No Docker required
  • โœ… Native performance
  • โš ๏ธ Manual dependency management

Build from Source:

  • โœ… Latest features
  • โœ… Custom optimizations
  • โš ๏ธ Requires development tools (30+ minutes)

Choose Docker unless you have specific requirements.

How do I download models?

# Using Inferno's built-in downloader
inferno models download llama-2-7b-chat
inferno models download mistral-7b-instruct

# List available models
inferno models available

# Manual download (from Hugging Face, etc.)
# Place .gguf or .onnx files in your models directory

Popular Model Sources:

Where are models stored?

Default Locations:

  • Linux: ~/.local/share/inferno/models/
  • macOS: ~/Library/Application Support/inferno/models/
  • Windows: %APPDATA%\inferno\models\
  • Docker: /data/models/ (mapped to host volume)

Custom Location:

# Via command line
inferno serve --models-dir /path/to/models

# Via config file
models_dir = "/custom/path/models"

# Via environment variable
export INFERNO_MODELS_DIR="/custom/path/models"

How do I update Inferno?

Docker:

docker pull inferno:latest
docker stop inferno && docker rm inferno
docker run -d --name inferno -p 8080:8080 inferno:latest serve

Binary:

# Download latest release
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-x86_64.tar.gz
tar xzf inferno-linux-x86_64.tar.gz
sudo mv inferno /usr/local/bin/

Source:

cd inferno
git pull
cargo build --release
sudo cp target/release/inferno /usr/local/bin/

๐Ÿ” Privacy & Security

How private is Inferno really?

100% Private by Design:

  • โœ… No data transmission - Everything runs locally
  • โœ… No telemetry - Inferno doesn't "phone home"
  • โœ… No cloud dependencies - Works completely offline
  • โœ… Open source - Audit the code yourself
  • โœ… Your infrastructure - You control everything

Even more private than:

  • Self-hosted cloud solutions (no internet required)
  • VPN + cloud AI (no external connections)
  • Enterprise AI platforms (no vendor access)

Is Inferno secure?

Security Features:

  • โœ… Input validation - All user inputs sanitized
  • โœ… Memory safety - Written in Rust (memory-safe language)
  • โœ… Authentication - JWT tokens, API keys, RBAC
  • โœ… Audit logging - Track all operations
  • โœ… Rate limiting - Prevent abuse
  • โœ… TLS/HTTPS - Encrypted communications

Security Best Practices:

  • Enable authentication for production use
  • Use HTTPS/TLS for network access
  • Keep Inferno updated
  • Follow Security Hardening guide

Can I use Inferno for sensitive data?

Yes! Inferno is designed for sensitive use cases:

Industries Using Inferno:

  • Healthcare - HIPAA-compliant AI analysis
  • Finance - PCI DSS compliant document processing
  • Legal - Attorney-client privileged document review
  • Government - Classified/sensitive information processing
  • Research - Proprietary data analysis

Compliance Features:

  • Air-gapped deployment support
  • Comprehensive audit trails
  • Data residency control
  • No external dependencies

๐Ÿš€ Usage & Integration

How do I integrate Inferno with my application?

OpenAI-Compatible API:

# Use any OpenAI client library
import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"  # Or your actual API key if auth enabled
)

response = client.chat.completions.create(
    model="your-model",
    messages=[{"role": "user", "content": "Hello!"}]
)

Native REST API:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model", "messages": [...]}'

WebSocket Streaming:

const ws = new WebSocket('ws://localhost:8080/ws/stream');
ws.send(JSON.stringify({model: "your-model", prompt: "Hello!"}));

Can I run multiple models simultaneously?

Yes! Inferno supports multiple approaches:

Single Instance, Multiple Models:

# Load multiple models
inferno models load llama-2-7b
inferno models load codellama-13b

# Use different models in requests
curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "llama-2-7b", "messages": [...]}'

curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "codellama-13b", "messages": [...]}'

Multiple Instances:

# Run different models on different ports
inferno serve --model llama-2-7b --bind 0.0.0.0:8080 &
inferno serve --model codellama-13b --bind 0.0.0.0:8081 &

Load Balancing:

# Distribute requests across multiple instances
inferno serve --distributed --workers 4

How do I process files in batch?

# Process multiple prompts from a file
inferno run --model your-model --batch --input prompts.txt --output responses.jsonl

# Process documents
inferno run --model your-model --batch --input documents/ --output summaries.json

# Scheduled batch processing
inferno batch-queue create --schedule "0 2 * * *" --input daily_reports/ --model summarizer

Input Formats:

  • Text files - One prompt per line
  • JSON Lines - Structured prompts
  • CSV - Tabular data
  • Directory - Process all files in a folder

๐Ÿ› ๏ธ Troubleshooting

Inferno won't start

Check Common Issues:

# Verify installation
inferno --version

# Check port availability
sudo lsof -i :8080

# Try different port
inferno serve --bind 0.0.0.0:8081

# Check permissions
ls -la /usr/local/bin/inferno

# View detailed logs
RUST_LOG=debug inferno serve

Models won't load

# Verify model exists
inferno models list

# Check model format
file your-model.gguf  # Should show GGUF format

# Check available memory
free -h  # Linux
vm_stat  # macOS

# Try smaller model
inferno models download microsoft/DialoGPT-small

Poor performance

Check Resource Usage:

# Monitor during inference
top       # CPU usage
nvidia-smi  # GPU usage (NVIDIA)
iostat    # Disk I/O

Common Solutions:

  • Use smaller model (7B instead of 13B+)
  • Enable GPU acceleration
  • Reduce context_size in config
  • Use quantized models (Q4_0, Q5_0)
  • Add more RAM or switch to SSD

API returns errors

# Test basic connectivity
curl http://localhost:8080/health

# Check server logs
docker logs inferno  # Docker
journalctl -f -u inferno  # systemd

# Verify model name
inferno models list

# Test with simple request
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model", "messages": [{"role": "user", "content": "test"}]}'

๐Ÿ’ผ Licensing & Commercial Use

What about commercial use?

Inferno License: MIT/Apache 2.0 - Free for commercial use!

Model Licenses: Check individual model licenses

  • Open Models (Llama 2, Mistral, etc.) - Free commercial use
  • Commercial Models - May require paid licenses
  • Custom Models - Your own license terms

Can I sell applications built with Inferno?

Yes! The MIT/Apache 2.0 license allows:

  • โœ… Commercial use
  • โœ… Modification and distribution
  • โœ… Private use
  • โœ… Selling applications that include Inferno
  • โœ… SaaS applications using Inferno

Just remember:

  • Include license notice in your distributions
  • Check individual model licenses
  • No warranty provided (standard open source terms)

๐ŸŽ“ Learning & Community

I'm new to AI. Where should I start?

Learning Path:

  1. Start Here: Quick Start Tutorial - Get your first AI working
  2. Understand Basics: Usage Examples - See what's possible
  3. Explore Models: Model Management - Try different AI capabilities
  4. Optimize: Performance Tuning - Make it faster
  5. Production: Production Deployment - Scale up

Recommended Reading:

How can I contribute to Inferno?

Ways to Help:

Where can I get help?

Community Resources:

  • GitHub Discussions - Community help and questions (fastest response)
  • GitHub Issues - Bug reports
  • Wiki - This documentation you're reading!
  • Enterprise Support - Contact maintainer for specialized installation assistance (information and pricing available)

Response Times:

  • GitHub Discussions - Community-driven, usually within hours
  • GitHub Issues - 1-3 days for bug reports and technical issues
  • Wiki - Community maintained, always available

๐Ÿ”ฎ Future & Roadmap

What's planned for future versions?

Near Term (v1.1-1.2):

  • Visual web dashboard
  • Model marketplace integration
  • Enhanced multi-GPU support
  • Performance optimizations

Medium Term (v1.3-1.5):

  • Multi-modal support (vision, audio)
  • Advanced clustering and scaling
  • Enterprise SSO integration
  • Advanced monitoring and analytics

Long Term (v2.0+):

  • Federated learning support
  • Edge device deployment
  • Custom model training integration
  • Advanced AI workflow automation

How stable is Inferno?

Current Status: Production ready for most use cases

Stability:

  • โœ… Core Features - Stable, well-tested
  • โœ… GGUF/ONNX Support - Production ready
  • โœ… API Compatibility - OpenAI-compatible, stable
  • โš ๏ธ Advanced Features - Some features still evolving
  • โš ๏ธ Breaking Changes - Possible in minor versions (following semver)

Upgrade Recommendations:

  • Production - Pin to specific versions, test upgrades
  • Development - Use latest for newest features
  • Enterprise - Consider LTS releases when available

FAQ last updated for Inferno v1.0.0. Question not answered? Visit GitHub Discussions for community help!

Clone this wiki locally