Frequently asked questions about installing, running, and troubleshooting Dream Server.
Also see:
docs/FAQ.mdfor hardware requirements, pricing, and comparisons with alternatives.
Dream Server is a turnkey local AI stack that runs entirely on your own hardware. It includes:
- LLM inference via llama-server (qwen2.5-32b-instruct)
- Web dashboard for chat and model management
- Voice capabilities (STT via Whisper, TTS via Kokoro)
- Workflow automation via n8n
- API gateway with privacy shield for external services
Minimum (bootstrap mode):
- Any modern CPU
- 8GB RAM
- 10GB disk space
- Docker + Docker Compose
Recommended (full experience):
- NVIDIA GPU with 24GB+ VRAM (RTX 3090/4090)
- 32GB+ system RAM
- 100GB+ SSD storage
- Ubuntu 22.04/24.04 or WSL2 on Windows
Initial setup: Yes, to download models and Docker images.
After setup: No. Dream Server is designed for offline/air-gapped operation. All models run locally.
Yes. Everything runs on your hardware:
- Conversations never leave your machine
- Voice processing is local
- API calls to external services go through the Privacy Shield (PII redaction)
- No telemetry or analytics
Dream Server is free and open source (Apache 2.0 license). You only pay for:
- Your hardware (one-time cost)
- Electricity to run it
Linux:
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp dockerWindows: Install Docker Desktop from https://docs.docker.com/desktop/install/windows-install/ Enable WSL2 backend in Docker Desktop settings.
Make the script executable:
chmod +x install.sh
./install.shThis is normal for large models (20GB+). The installer shows progress bars with:
- Download speed
- Time elapsed
- ETA
To speed up: Use a wired connection. WiFi can be unstable for large downloads.
To restart: The installer resumes partial downloads automatically.
./scripts/upgrade-model.shThis hot-swaps from the 1.5B bootstrap model to your full model without downtime.
./install.sh --no-bootstrapThis downloads the full model first. You'll wait longer before first use.
Use the dream CLI:
dream model current # See what's running
dream model list # Show available tiers and models
dream model swap T3 # Switch to Tier 3 (e.g., Qwen3 14B)The model file must already be downloaded. If it isn't, pre-fetch it first:
./scripts/pre-download.sh --tier 3Yes. Drop the .gguf file into data/models/, then update .env:
GGUF_FILE=my-model.gguf
LLM_MODEL=my-modelRestart the inference server:
docker compose restart llama-serverThe model will load in ~30-120 seconds depending on size. If it fails, Dream Server automatically rolls back to the previous model.
The installer auto-selects based on your GPU, but you can switch between any tier:
| Tier | Model | Min VRAM |
|---|---|---|
| T1 | Qwen3 8B | 8 GB |
| T2 | Qwen3 8B | 12 GB |
| T3 | Qwen3 14B | 20 GB |
| T4 | Qwen3 30B-A3B (MoE) | 40 GB |
| SH_COMPACT | Qwen3 30B-A3B (MoE) | 64 GB unified |
| SH_LARGE | Qwen3 Coder Next 80B (MoE) | 90 GB unified |
Run dream model list for the full list on your system.
Check driver:
nvidia-smiIf missing: Install NVIDIA drivers:
# Ubuntu
sudo apt update
sudo apt install nvidia-driver-550
sudo rebootCheck Docker runtime:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart dockerYour GPU doesn't have enough VRAM. Options:
- Use a smaller model (qwen2.5-7b-instruct instead of 32b)
- All models use GGUF Q4_K_M quantization by default
- Reduce
CTX_SIZEin.env(try 4096) - Run on CPU only (slower but works)
Enable WSL2 manually:
wsl --install -d Ubuntu-24.04
wsl --set-default-version 2Then restart the installer.
Check if services are running:
docker compose psCheck logs:
docker compose logs dashboard-api
docker compose logs llama-serverCommon fixes:
- Wait 30 seconds for services to start
- Check http://localhost:3001 (direct API) vs http://localhost:3000 (UI)
- Restart:
docker compose restart
docker compose down -v # Stop and remove containers + volumes
rm -rf ~/dream-server # Remove installation directory (optional)This removes Docker containers and volumes. Add -v to also remove downloaded models and data.
http://localhost:3000
On first run, the installer displays a QR code. Scan it with your phone for instant mobile access.
The installer generates secure random passwords and displays them at the end. Look for:
✓ Dashboard URL: http://localhost:3000
✓ API Key: dsf8a9s7df8a9s7df...
Passwords are also saved to .env in the dream-server directory.
Edit .env:
nano .env
# Change: DASHBOARD_PASSWORD=your-new-password
docker compose restart dashboardYes! Use your machine's local IP:
http://192.168.1.xxx:3000
The installer shows this URL with a QR code at the end.
- Open http://localhost:3000/workflows
- Click "New Workflow"
- Select a template or start from scratch
- Connect nodes (triggers → actions)
- Save and activate
n8n is the workflow engine built into Dream Server. It provides:
- Visual workflow editor
- 400+ integrations (GitHub, Slack, email, etc.)
- Webhook triggers
- Scheduled jobs
- AI agent capabilities
Yes, through the Privacy Shield:
- Configure the shield service (runs on port 8085)
- Route API calls through
http://localhost:8085/proxy/{service} - PII is automatically redacted before leaving your network
Prerequisites: Microphone and speakers/headphones
- Open the Voice page in the dashboard
- Click "Start Conversation"
- Allow microphone access
- Speak naturally — the system handles STT → LLM → TTS automatically
| Model | Speed | Accuracy | Use Case |
|---|---|---|---|
| tiny | ~400ms | Good | Quick commands |
| base | ~700ms | Better | General use |
| small | ~2s | Best | Accuracy critical |
| large-v3 | ~8s | Excellent | Offline transcription |
Default is base. Change in Settings → Voice.
Kokoro provides high-quality voices. Options:
af_bella— Natural female (default)af_nicole— Professional femaleam_adam— Natural maleam_michael— Professional male
Preview voices in Settings → Voice → Test.
All services:
docker compose logs -fSpecific service:
docker compose logs -f llama-server
docker compose logs -f dashboard-api
docker compose logs -f voice-agentTo file:
docker compose logs > dream-server.log 2>&1docker compose down
docker compose up -dOr restart specific services:
docker compose restart llama-server- Check if the API container is running:
docker compose ps dashboard-api - Check logs:
docker compose logs dashboard-api - Verify port 3001 is not in use:
sudo lsof -i :3001 - Restart:
docker compose restart dashboard-api
Check disk space:
df -hModels need ~20GB per model. Free up space if needed.
Check model download:
ls -la data/models/If empty or incomplete, re-download:
./scripts/pre-download.shSTT issues:
- Check microphone input level
- Reduce background noise
- Try a different STT model (base → small)
TTS issues:
- Check speaker/headphone connection
- Adjust TTS speed in Settings
- Try different voices
Check GPU utilization:
nvidia-smiIf GPU is at 100%, you're GPU-bound. Solutions:
- Reduce concurrent requests
- Use a smaller model
- Enable KV cache quantization
Check if using CPU:
If nvidia-smi shows no process, the model is running on CPU (very slow). Fix GPU detection issues above.
The Privacy Shield has rate limiting to prevent abuse. Default: 100 requests/minute.
To increase:
- Edit
.env - Change
RATE_LIMIT_REQUESTS_PER_MINUTE=100 - Restart:
docker compose restart privacy-shield
Check webhook URL: Must be accessible from the triggering service.
Check n8n logs:
docker compose logs n8nVerify workflow is active: In the workflow editor, toggle must be ON (green).
Clean up unused volumes:
docker volume pruneOr remove everything (destructive):
docker compose down -vSee How do I switch to a different model? and Can I use my own GGUF model? above.
Short version: Drop your .gguf file into data/models/, set GGUF_FILE and LLM_MODEL in .env, run docker compose restart llama-server. Rollback is automatic on failure.
For production deployments, use a reverse proxy (nginx, Caddy, Traefik) in front of Dream Server:
# Example with Caddy (auto-HTTPS with Let's Encrypt)
caddy reverse-proxy --from your-domain.com --to localhost:3000For local development, browsers accept self-signed certs at https://localhost.
Yes! Edit docker-compose.nvidia.yml to expose multiple GPUs:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 2 # Number of GPUs
capabilities: [gpu]Configs and data:
tar -czf dream-server-backup.tar.gz .env data/Models (large):
rsync -av models/ /backup/location/models/./dream-update.shOr manually:
git pull
docker compose pull
docker compose up -dThis pulls latest code, updates Docker images, and migrates data.
SQLite databases are in Docker volumes:
dream-server_n8n-data— Workflows and credentialsdream-server_agent-monitor— Metrics and logs
Access via:
docker compose exec n8n sqlite3 /home/node/.n8n/database.sqliteYes, through the Privacy Shield. Configure in Settings → API Keys.
Your requests go: You → Shield (PII redaction) → OpenAI → Shield (deanonymization) → You
Open the Dashboard → Metrics page for:
- GPU utilization and temperature
- Request latency (P50, P95, P99)
- Token throughput
- Active connections
Or use the API:
curl http://localhost:3001/api/metrics| Port | Service |
|---|---|
| 3000 | Open WebUI (chat interface) |
| 3001 | Dashboard |
| 3002 | Dashboard API |
| 8080 | llama-server API |
| 8085 | Privacy Shield |
| 5678 | n8n workflow editor |
| 7880 | LiveKit voice server |
| 9000 | Whisper STT |
| 8880 | Kokoro TTS |
| 6333 | Qdrant vector DB |
| 8090 | Embeddings service |
Edit .env:
DASHBOARD_PORT=8080Then restart: docker compose up -d
- Main README:
dream-server/README.md - Installer Architecture:
docs/INSTALLER-ARCHITECTURE.md - Security:
SECURITY.md
- GitHub Issues: https://github.com/Light-Heart-Labs/DreamServer/issues
- Discord: #general channel
Include this output:
# Collect system info
echo "=== Docker Compose ===" && docker compose version
echo "=== Services ===" && docker compose ps
echo "=== Recent Logs ===" && docker compose logs --tail=50
echo "=== GPU ===" && nvidia-smi 2>/dev/null || echo "No GPU"Copy the output into your GitHub issue.
Last updated: 2026-03-05