Skip to content

Quick Start Tutorial

Ryan Robson edited this page Sep 16, 2025 · 2 revisions

πŸš€ Quick Start Tutorial

Get your first AI conversation running in 10 minutes! This tutorial walks you through everything from installation to your first AI response.

🎯 What You'll Accomplish

By the end of this tutorial, you'll have:

  • βœ… Inferno running on your system
  • βœ… A working AI model loaded
  • βœ… Successfully generated AI responses
  • βœ… Understanding of basic commands

Time Required: 10-15 minutes Skill Level: Beginner Prerequisites: None!


Step 1: Install Inferno (2 minutes)

Choose the fastest method for your system:

🐳 Docker (Recommended - Fastest)

# One command to rule them all!
docker run -d --name inferno -p 8080:8080 inferno:latest serve --demo

# Verify it's running
curl http://localhost:8080/health
# Should return: {"status":"healthy"}

πŸ“¦ Pre-built Binary (Alternative)

# Linux/macOS
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-$(uname -s)-$(uname -m).tar.gz
tar xzf inferno-*.tar.gz && sudo mv inferno /usr/local/bin/

# Verify installation
inferno --version

βœ… Checkpoint: You should see version information or a healthy status response.


Step 2: Get Your First Model (3 minutes)

Let's download a small, fast model perfect for testing:

# Download a 7B parameter model (great for testing)
inferno models download microsoft/DialoGPT-small

# Alternative: Use our demo model (built into Docker)
inferno models list

Expected Output:

Available models:
microsoft-DialoGPT-small  gguf  1.2 GB  2024-12-16 10:30

βœ… Checkpoint: You should see at least one model in your list.


Step 3: Start the Server (1 minute)

If not using Docker, start the Inferno server:

# Start the server
inferno serve --bind 0.0.0.0:8080

# You should see:
# [INFO] Starting HTTP server on 0.0.0.0:8080
# [INFO] HTTP API server is running on http://0.0.0.0:8080

Keep this terminal open and open a new one for the next steps.

βœ… Checkpoint: Server should be running without errors.


Step 4: Your First AI Conversation (2 minutes)

Method 1: Simple cURL Request

# Your first AI question!
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft-DialoGPT-small",
    "messages": [
      {"role": "user", "content": "Hello! What can you help me with?"}
    ],
    "max_tokens": 100
  }'

Method 2: Command Line Interface

# Even simpler - use the CLI
inferno run --model microsoft-DialoGPT-small --prompt "Hello! What can you help me with?"

Method 3: Interactive Mode

# Start interactive chat session
inferno chat --model microsoft-DialoGPT-small

# Type your messages and press Enter
# Type 'exit' to quit

βœ… Checkpoint: You should receive an AI-generated response!


Step 5: Try Different Features (2 minutes)

Streaming Responses (Watch AI "Think")

# See the response generate in real-time
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft-DialoGPT-small",
    "messages": [{"role": "user", "content": "Tell me a short story about AI"}],
    "stream": true
  }'

Different Types of Requests

# Code generation
inferno run --model microsoft-DialoGPT-small \
  --prompt "Write a Python function that calculates fibonacci numbers"

# Question answering
inferno run --model microsoft-DialoGPT-small \
  --prompt "What are the benefits of running AI locally?"

# Creative writing
inferno run --model microsoft-DialoGPT-small \
  --prompt "Write a haiku about privacy and AI"

πŸŽ‰ Congratulations!

You've successfully:

  • βœ… Installed Inferno
  • βœ… Downloaded and loaded a model
  • βœ… Generated AI responses
  • βœ… Tried different interaction methods

πŸš€ What's Next?

Immediate Next Steps

  1. Try a Better Model: Model Management - Download larger, more capable models
  2. Explore Features: Usage Examples - See real-world use cases
  3. Optimize Performance: Performance Tuning - Make it faster for your hardware

Common Next Actions

For Developers:

# Set up OpenAI-compatible endpoint
# Now use any OpenAI client library with Inferno!
import openai
client = openai.OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")

For Power Users:

# Enable authentication and monitoring
inferno serve --auth --metrics --config production.toml

For Businesses:

# Deploy with Docker Compose (includes monitoring)
wget https://raw.githubusercontent.com/ringo380/inferno/main/docker-compose.yml
docker-compose up -d

πŸ› οΈ Customize Your Setup

Configuration File

Create inferno.toml for persistent settings:

# ~/.config/inferno/inferno.toml
models_dir = "/home/user/ai-models"
log_level = "info"

[server]
bind_address = "0.0.0.0"
port = 8080

[backend_config]
gpu_enabled = true
context_size = 4096

Environment Variables

# Quick configuration via environment
export INFERNO_LOG_LEVEL=debug
export INFERNO_MODELS_DIR="/custom/path/models"

# Start with custom config
inferno serve

πŸ› Troubleshooting Your First Run

"Model not found"

# List available models
inferno models list

# Download if missing
inferno models download model-name

"Connection refused"

# Check if server is running
curl http://localhost:8080/health

# Check what's using port 8080
sudo lsof -i :8080

# Use different port
inferno serve --bind 0.0.0.0:8081

"Out of memory"

# Use smaller model
inferno models download microsoft/DialoGPT-small

# Reduce context size
inferno serve --context-size 2048

Docker Issues

# Check Docker status
docker ps

# View logs
docker logs inferno

# Restart container
docker restart inferno

πŸ’‘ Pro Tips for New Users

Performance Tips

  • Start Small: Use 7B models first, then upgrade
  • SSD Storage: Store models on SSD for faster loading
  • GPU Memory: Monitor GPU usage with nvidia-smi

Usage Tips

  • Batch Processing: Use --batch for multiple prompts
  • Save Responses: Use --output file.txt to save results
  • Custom Instructions: Create prompt templates for consistency

Exploration Ideas

# Try different AI tasks
inferno run --model your-model --prompt "Summarize: [paste long text]"
inferno run --model your-model --prompt "Translate to Spanish: Hello world"
inferno run --model your-model --prompt "Extract key points from: [paste content]"
inferno run --model your-model --prompt "Fix this code: [paste code]"

πŸ“š Learn More

Essential Reading

Community Resources

Advanced Topics


❓ Need Help?

Quick Questions: Check our FAQ Technical Issues: See Troubleshooting Community Help: Visit GitHub Discussions Bug Reports: GitHub Issues


🎊 Welcome to the Inferno Community!

You're now part of a growing community of developers, researchers, and organizations using AI while maintaining privacy and control.

Share your success on GitHub Discussions - we love hearing about your use cases!


Tutorial last updated for Inferno v1.0.0. Found this helpful? Consider Contributing to Wiki to help others!

Clone this wiki locally