-
Notifications
You must be signed in to change notification settings - Fork 0
Quick Start Tutorial
Get your first AI conversation running in 10 minutes! This tutorial walks you through everything from installation to your first AI response.
By the end of this tutorial, you'll have:
- β Inferno running on your system
- β A working AI model loaded
- β Successfully generated AI responses
- β Understanding of basic commands
Time Required: 10-15 minutes Skill Level: Beginner Prerequisites: None!
Choose the fastest method for your system:
# One command to rule them all!
docker run -d --name inferno -p 8080:8080 inferno:latest serve --demo
# Verify it's running
curl http://localhost:8080/health
# Should return: {"status":"healthy"}# Linux/macOS
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-$(uname -s)-$(uname -m).tar.gz
tar xzf inferno-*.tar.gz && sudo mv inferno /usr/local/bin/
# Verify installation
inferno --versionβ Checkpoint: You should see version information or a healthy status response.
Let's download a small, fast model perfect for testing:
# Download a 7B parameter model (great for testing)
inferno models download microsoft/DialoGPT-small
# Alternative: Use our demo model (built into Docker)
inferno models listExpected Output:
Available models:
microsoft-DialoGPT-small gguf 1.2 GB 2024-12-16 10:30
β Checkpoint: You should see at least one model in your list.
If not using Docker, start the Inferno server:
# Start the server
inferno serve --bind 0.0.0.0:8080
# You should see:
# [INFO] Starting HTTP server on 0.0.0.0:8080
# [INFO] HTTP API server is running on http://0.0.0.0:8080Keep this terminal open and open a new one for the next steps.
β Checkpoint: Server should be running without errors.
# Your first AI question!
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "microsoft-DialoGPT-small",
"messages": [
{"role": "user", "content": "Hello! What can you help me with?"}
],
"max_tokens": 100
}'# Even simpler - use the CLI
inferno run --model microsoft-DialoGPT-small --prompt "Hello! What can you help me with?"# Start interactive chat session
inferno chat --model microsoft-DialoGPT-small
# Type your messages and press Enter
# Type 'exit' to quitβ Checkpoint: You should receive an AI-generated response!
# See the response generate in real-time
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "microsoft-DialoGPT-small",
"messages": [{"role": "user", "content": "Tell me a short story about AI"}],
"stream": true
}'# Code generation
inferno run --model microsoft-DialoGPT-small \
--prompt "Write a Python function that calculates fibonacci numbers"
# Question answering
inferno run --model microsoft-DialoGPT-small \
--prompt "What are the benefits of running AI locally?"
# Creative writing
inferno run --model microsoft-DialoGPT-small \
--prompt "Write a haiku about privacy and AI"You've successfully:
- β Installed Inferno
- β Downloaded and loaded a model
- β Generated AI responses
- β Tried different interaction methods
- Try a Better Model: Model Management - Download larger, more capable models
- Explore Features: Usage Examples - See real-world use cases
- Optimize Performance: Performance Tuning - Make it faster for your hardware
For Developers:
# Set up OpenAI-compatible endpoint
# Now use any OpenAI client library with Inferno!
import openai
client = openai.OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")For Power Users:
# Enable authentication and monitoring
inferno serve --auth --metrics --config production.tomlFor Businesses:
# Deploy with Docker Compose (includes monitoring)
wget https://raw.githubusercontent.com/ringo380/inferno/main/docker-compose.yml
docker-compose up -dCreate inferno.toml for persistent settings:
# ~/.config/inferno/inferno.toml
models_dir = "/home/user/ai-models"
log_level = "info"
[server]
bind_address = "0.0.0.0"
port = 8080
[backend_config]
gpu_enabled = true
context_size = 4096# Quick configuration via environment
export INFERNO_LOG_LEVEL=debug
export INFERNO_MODELS_DIR="/custom/path/models"
# Start with custom config
inferno serve# List available models
inferno models list
# Download if missing
inferno models download model-name# Check if server is running
curl http://localhost:8080/health
# Check what's using port 8080
sudo lsof -i :8080
# Use different port
inferno serve --bind 0.0.0.0:8081# Use smaller model
inferno models download microsoft/DialoGPT-small
# Reduce context size
inferno serve --context-size 2048# Check Docker status
docker ps
# View logs
docker logs inferno
# Restart container
docker restart inferno- Start Small: Use 7B models first, then upgrade
- SSD Storage: Store models on SSD for faster loading
-
GPU Memory: Monitor GPU usage with
nvidia-smi
-
Batch Processing: Use
--batchfor multiple prompts -
Save Responses: Use
--output file.txtto save results - Custom Instructions: Create prompt templates for consistency
# Try different AI tasks
inferno run --model your-model --prompt "Summarize: [paste long text]"
inferno run --model your-model --prompt "Translate to Spanish: Hello world"
inferno run --model your-model --prompt "Extract key points from: [paste content]"
inferno run --model your-model --prompt "Fix this code: [paste code]"- Usage Examples - Real-world scenarios and solutions
- Model Management - How to find, download, and organize models
- Configuration Guide - Detailed configuration options
- API Examples - Integration with your applications
- GitHub Discussions - Get help and share experiences
- GitHub Discussions - Feature requests and technical discussions
- FAQ - Common questions and answers
- Production Deployment - Scale to handle real workloads
- Security Hardening - Secure your installation
- Monitoring Setup - Track performance and usage
- Load Balancing - Distribute across multiple instances
Quick Questions: Check our FAQ Technical Issues: See Troubleshooting Community Help: Visit GitHub Discussions Bug Reports: GitHub Issues
You're now part of a growing community of developers, researchers, and organizations using AI while maintaining privacy and control.
Share your success on GitHub Discussions - we love hearing about your use cases!
Tutorial last updated for Inferno v1.0.0. Found this helpful? Consider Contributing to Wiki to help others!