FAQ

❓ Frequently Asked Questions

Common questions and answers about Inferno. Can't find what you're looking for? Visit GitHub Discussions!

🎯 General Questions

What is Inferno?

Inferno is a production-ready AI inference server that runs entirely on your local hardware. Think of it as your private ChatGPT that works offline, supports multiple model formats, and gives you complete control over your AI infrastructure.

How is Inferno different from cloud AI services?

Aspect	Cloud AI (OpenAI, etc.)	Inferno
Privacy	Data sent to external servers	100% local processing
Internet	Required for every request	Works completely offline
Models	Limited to provider's models	Use any model you want
Speed	Network dependent	Local hardware speed
Control	Limited customization	Full control over everything

Is Inferno really free?

Yes! Inferno is completely open source under MIT/Apache 2.0 licenses. You pay only for:

Your hardware (computer, GPU)
Electricity to run it
Models (many are free, some commercial models require licenses)

No subscription fees, no per-token costs, no vendor lock-in.

What can I do with Inferno?

Chat with AI models (like ChatGPT, but private)
Generate code (coding assistants, debugging)
Process documents (summarization, analysis, translation)
Create content (writing, brainstorming, creative tasks)
Automate workflows (batch processing, API integration)
Build AI-powered applications (embed in your software)

🛠️ Technical Questions

What models does Inferno support?

Format Support:

✅ GGUF - Optimized format for CPU/GPU inference (llama.cpp compatible)
✅ ONNX - Cross-platform ML format (PyTorch, TensorFlow, scikit-learn exports)
✅ PyTorch - Via conversion to GGUF/ONNX
✅ SafeTensors - Via conversion to GGUF/ONNX

Popular Models:

Llama 2 (7B, 13B, 70B) - General purpose, code, chat
Mistral (7B, 8x7B) - High performance, multilingual
CodeLlama (7B, 13B, 34B) - Code generation and completion
Vicuna, Alpaca, WizardLM - Various specialized models
Custom models - Any GGUF or ONNX model

What hardware do I need?

Minimum (7B models):

8GB RAM, any modern CPU
20GB storage space
No GPU required (but helpful)

Recommended (13B models):

16GB RAM, multi-core CPU
NVIDIA RTX 3060+ or Apple M1/M2
50GB+ SSD storage

Optimal (30B+ models):

32GB+ RAM
NVIDIA RTX 4080+ or Apple M2 Ultra
100GB+ NVMe SSD

See System Requirements for detailed specifications.

Does Inferno require internet?

No! Inferno runs completely offline after initial setup:

Requires Internet:

Downloading Inferno itself
Downloading AI models (one-time)
Software updates (optional)

Works Offline:

All AI inference and chat
Model conversion and management
API services and batch processing
Everything after initial setup

Perfect for air-gapped environments, remote locations, or privacy-critical applications.

How do I get better performance?

Hardware Upgrades:

GPU: Single biggest performance boost (10-50x faster)
SSD: Faster model loading and caching
RAM: Handle larger models and context

Software Optimizations:

Model Choice: Smaller models (7B vs 70B) run much faster
Quantization: Q4_0 models use less memory, run faster
Context Size: Reduce context_size in config
Batch Size: Tune batch_size for your hardware

See Performance Tuning for detailed optimization guide.

🔧 Setup & Installation

Which installation method should I choose?

Docker (Recommended):

✅ Fastest setup (5 minutes)
✅ Consistent environment
✅ Easy updates and backups
✅ Works on any platform

Pre-built Binaries:

✅ No Docker required
✅ Native performance
⚠️ Manual dependency management

Build from Source:

✅ Latest features
✅ Custom optimizations
⚠️ Requires development tools (30+ minutes)

Choose Docker unless you have specific requirements.

How do I download models?

# Using Inferno's built-in downloader
inferno models download llama-2-7b-chat
inferno models download mistral-7b-instruct

# List available models
inferno models available

# Manual download (from Hugging Face, etc.)
# Place .gguf or .onnx files in your models directory

Popular Model Sources:

Hugging Face - Largest collection
TheBloke - High-quality GGUF conversions
Microsoft - Official Microsoft models
Meta - Official Llama models

Where are models stored?

Default Locations:

Linux: ~/.local/share/inferno/models/
macOS: ~/Library/Application Support/inferno/models/
Windows: %APPDATA%\inferno\models\
Docker: /data/models/ (mapped to host volume)

Custom Location:

# Via command line
inferno serve --models-dir /path/to/models

# Via config file
models_dir = "/custom/path/models"

# Via environment variable
export INFERNO_MODELS_DIR="/custom/path/models"

How do I update Inferno?

Docker:

docker pull inferno:latest
docker stop inferno && docker rm inferno
docker run -d --name inferno -p 8080:8080 inferno:latest serve

Binary:

# Download latest release
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-x86_64.tar.gz
tar xzf inferno-linux-x86_64.tar.gz
sudo mv inferno /usr/local/bin/

Source:

cd inferno
git pull
cargo build --release
sudo cp target/release/inferno /usr/local/bin/

🔐 Privacy & Security

How private is Inferno really?

100% Private by Design:

✅ No data transmission - Everything runs locally
✅ No telemetry - Inferno doesn't "phone home"
✅ No cloud dependencies - Works completely offline
✅ Open source - Audit the code yourself
✅ Your infrastructure - You control everything

Even more private than:

Self-hosted cloud solutions (no internet required)
VPN + cloud AI (no external connections)
Enterprise AI platforms (no vendor access)

Is Inferno secure?

Security Features:

✅ Input validation - All user inputs sanitized
✅ Memory safety - Written in Rust (memory-safe language)
✅ Authentication - JWT tokens, API keys, RBAC
✅ Audit logging - Track all operations
✅ Rate limiting - Prevent abuse
✅ TLS/HTTPS - Encrypted communications

Security Best Practices:

Enable authentication for production use
Use HTTPS/TLS for network access
Keep Inferno updated
Follow Security Hardening guide

Can I use Inferno for sensitive data?

Yes! Inferno is designed for sensitive use cases:

Industries Using Inferno:

Healthcare - HIPAA-compliant AI analysis
Finance - PCI DSS compliant document processing
Legal - Attorney-client privileged document review
Government - Classified/sensitive information processing
Research - Proprietary data analysis

Compliance Features:

Air-gapped deployment support
Comprehensive audit trails
Data residency control
No external dependencies

🚀 Usage & Integration

How do I integrate Inferno with my application?

OpenAI-Compatible API:

# Use any OpenAI client library
import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"  # Or your actual API key if auth enabled
)

response = client.chat.completions.create(
    model="your-model",
    messages=[{"role": "user", "content": "Hello!"}]
)

Native REST API:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model", "messages": [...]}'

WebSocket Streaming:

const ws = new WebSocket('ws://localhost:8080/ws/stream');
ws.send(JSON.stringify({model: "your-model", prompt: "Hello!"}));

Can I run multiple models simultaneously?

Yes! Inferno supports multiple approaches:

Single Instance, Multiple Models:

# Load multiple models
inferno models load llama-2-7b
inferno models load codellama-13b

# Use different models in requests
curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "llama-2-7b", "messages": [...]}'

curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "codellama-13b", "messages": [...]}'

Multiple Instances:

# Run different models on different ports
inferno serve --model llama-2-7b --bind 0.0.0.0:8080 &
inferno serve --model codellama-13b --bind 0.0.0.0:8081 &

Load Balancing:

# Distribute requests across multiple instances
inferno serve --distributed --workers 4

How do I process files in batch?

# Process multiple prompts from a file
inferno run --model your-model --batch --input prompts.txt --output responses.jsonl

# Process documents
inferno run --model your-model --batch --input documents/ --output summaries.json

# Scheduled batch processing
inferno batch-queue create --schedule "0 2 * * *" --input daily_reports/ --model summarizer

Input Formats:

Text files - One prompt per line
JSON Lines - Structured prompts
CSV - Tabular data
Directory - Process all files in a folder

🛠️ Troubleshooting

Inferno won't start

Check Common Issues:

# Verify installation
inferno --version

# Check port availability
sudo lsof -i :8080

# Try different port
inferno serve --bind 0.0.0.0:8081

# Check permissions
ls -la /usr/local/bin/inferno

# View detailed logs
RUST_LOG=debug inferno serve

Models won't load

# Verify model exists
inferno models list

# Check model format
file your-model.gguf  # Should show GGUF format

# Check available memory
free -h  # Linux
vm_stat  # macOS

# Try smaller model
inferno models download microsoft/DialoGPT-small

Poor performance

Check Resource Usage:

# Monitor during inference
top       # CPU usage
nvidia-smi  # GPU usage (NVIDIA)
iostat    # Disk I/O

Common Solutions:

Use smaller model (7B instead of 13B+)
Enable GPU acceleration
Reduce context_size in config
Use quantized models (Q4_0, Q5_0)
Add more RAM or switch to SSD

API returns errors

# Test basic connectivity
curl http://localhost:8080/health

# Check server logs
docker logs inferno  # Docker
journalctl -f -u inferno  # systemd

# Verify model name
inferno models list

# Test with simple request
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model", "messages": [{"role": "user", "content": "test"}]}'

💼 Licensing & Commercial Use

What about commercial use?

Inferno License: MIT/Apache 2.0 - Free for commercial use!

Model Licenses: Check individual model licenses

Open Models (Llama 2, Mistral, etc.) - Free commercial use
Commercial Models - May require paid licenses
Custom Models - Your own license terms

Can I sell applications built with Inferno?

Yes! The MIT/Apache 2.0 license allows:

✅ Commercial use
✅ Modification and distribution
✅ Private use
✅ Selling applications that include Inferno
✅ SaaS applications using Inferno

Just remember:

Include license notice in your distributions
Check individual model licenses
No warranty provided (standard open source terms)

🎓 Learning & Community

I'm new to AI. Where should I start?

Learning Path:

Start Here: Quick Start Tutorial - Get your first AI working
Understand Basics: Usage Examples - See what's possible
Explore Models: Model Management - Try different AI capabilities
Optimize: Performance Tuning - Make it faster
Production: Production Deployment - Scale up

Recommended Reading:

Hugging Face Course - Free AI/ML course
Fast.ai - Practical deep learning
OpenAI Cookbook - AI application patterns

How can I contribute to Inferno?

Ways to Help:

Report Issues - Found a bug? Report it
Improve Docs - Contributing to Wiki
Share Examples - Post your use cases in Discussions
Write Code - See CONTRIBUTING.md
Help Others - Answer questions in GitHub Discussions

Where can I get help?

Community Resources:

GitHub Discussions - Community help and questions (fastest response)
GitHub Issues - Bug reports
Wiki - This documentation you're reading!
Enterprise Support - Contact maintainer for specialized installation assistance (information and pricing available)

Response Times:

GitHub Discussions - Community-driven, usually within hours
GitHub Issues - 1-3 days for bug reports and technical issues
Wiki - Community maintained, always available

🔮 Future & Roadmap

What's planned for future versions?

Near Term (v1.1-1.2):

Visual web dashboard
Model marketplace integration
Enhanced multi-GPU support
Performance optimizations

Medium Term (v1.3-1.5):

Multi-modal support (vision, audio)
Advanced clustering and scaling
Enterprise SSO integration
Advanced monitoring and analytics

Long Term (v2.0+):

Federated learning support
Edge device deployment
Custom model training integration
Advanced AI workflow automation

How stable is Inferno?

Current Status: Production ready for most use cases

Stability:

✅ Core Features - Stable, well-tested
✅ GGUF/ONNX Support - Production ready
✅ API Compatibility - OpenAI-compatible, stable
⚠️ Advanced Features - Some features still evolving
⚠️ Breaking Changes - Possible in minor versions (following semver)

Upgrade Recommendations:

Production - Pin to specific versions, test upgrades
Development - Use latest for newest features
Enterprise - Consider LTS releases when available

FAQ last updated for Inferno v1.0.0. Question not answered? Visit GitHub Discussions for community help!

FAQ

❓ Frequently Asked Questions

🎯 General Questions

What is Inferno?

How is Inferno different from cloud AI services?

Is Inferno really free?

What can I do with Inferno?

🛠️ Technical Questions

What models does Inferno support?

What hardware do I need?

Does Inferno require internet?

How do I get better performance?

🔧 Setup & Installation

Which installation method should I choose?

How do I download models?

Where are models stored?

How do I update Inferno?

🔐 Privacy & Security

How private is Inferno really?

Is Inferno secure?

Can I use Inferno for sensitive data?

🚀 Usage & Integration

How do I integrate Inferno with my application?

Can I run multiple models simultaneously?

How do I process files in batch?

🛠️ Troubleshooting

Inferno won't start

Models won't load

Poor performance

API returns errors

💼 Licensing & Commercial Use

What about commercial use?

Can I sell applications built with Inferno?

🎓 Learning & Community

I'm new to AI. Where should I start?

How can I contribute to Inferno?

Where can I get help?

🔮 Future & Roadmap

What's planned for future versions?

How stable is Inferno?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

📚 Inferno Wiki

🚀 Getting Started

📖 User Guides

🔧 Advanced Topics

💻 API & Integration

🛠️ Development

❓ Help & Support

📊 Reference

Clone this wiki locally