Quick answers to common questions.
Looking for install/runtime troubleshooting? See TROUBLESHOOTING.md and INSTALL-TROUBLESHOOTING.md.
Lightweight (runs on anything):
- GPU: Any (or CPU-only)
- RAM: 4GB+
- Storage: 15GB free
- Model: Qwen3.5 2B (auto-selected)
Minimum (comfortable):
- GPU: RTX 3060 12GB or RTX 4060 8GB
- RAM: 32GB
- Storage: 500GB NVMe SSD
- CPU: Any modern quad-core
Recommended (comfortable daily use):
- GPU: RTX 4070 Ti Super 16GB or RTX 4090 24GB
- RAM: 64GB
- Storage: 1TB NVMe SSD
Why these specs?
- 12GB VRAM = 7B-14B models, basic tasks
- 16GB VRAM = 32B models with reduced context
- 24GB VRAM = 32B models with full context, voice pipeline
- 48GB+ VRAM (2x 4090) = Multiple models, concurrent users
| Tier | GPU | Total Build | What You Get |
|---|---|---|---|
| Entry | RTX 3060 12GB | $800-1,200 | Basic chat, slow but works |
| Prosumer | RTX 4070 Ti 16GB | $2,000-3,000 | Comfortable single-user |
| Pro | RTX 4090 24GB | $4,000-6,000 | Fast, voice agents, 5-10 users |
| Enterprise | 2x RTX 4090 | $12,000-18,000 | 20-40 concurrent users |
See HARDWARE-GUIDE.md for full breakdown.
- Idle: 50-100W (~$5-15/month)
- Active inference: 300-450W per GPU
- 24/7 heavy use: $30-80/month depending on rates
Still cheaper than cloud API bills at moderate usage.
Out of the box:
- 💬 ChatGPT-style web interface (Open WebUI)
- 🎤 Voice transcription (Whisper)
- 🔊 Text-to-speech (Kokoro)
- 📄 Document Q&A with RAG (Qdrant + embeddings)
- 🔗 API integration (OpenAI-compatible endpoints)
- 🤖 Agent workflows (n8n)
With voice profile:
- 🎙️ Full voice agents (speak in, speak out)
- Real-time conversations at <2s latency
With optional components:
- 🔒 Privacy Shield (PII redaction proxy)
- 🖼️ Image generation (if you add FLUX/SD)
- 🔍 Local web search (SearXNG)
Real benchmarks from our dual-4090 cluster:
| Scenario | Latency | Concurrent Users |
|---|---|---|
| Single chat request | ~1.4s | 1 |
| 10 simultaneous chats | ~1.5s | 10 |
| 20 simultaneous chats | ~1.6s | 20 |
| Voice agent (full round-trip) | <2s | 15-20 per GPU |
Your results depend on hardware tier. Single 4090 ≈ half the concurrent capacity.
Honest answer: For most tasks, 32B local models are 80-90% as capable.
Where local wins:
- Speed (no network latency)
- Privacy (data never leaves your network)
- Cost (no per-token fees)
- Control (choose your model, tune prompts, no content filters)
Where cloud wins:
- Cutting-edge reasoning (GPT-4, Claude 3.5)
- Multimodal (vision, though Qwen-VL is catching up)
- Zero maintenance
Our recommendation: Use local for daily work, cloud for edge cases.
Example: 100,000 tokens/day usage
| Option | Monthly Cost | Notes |
|---|---|---|
| OpenAI GPT-4 | ~$300-600 | Per-token billing |
| Claude API | ~$200-400 | Per-token billing |
| Dream Server | $30-80 | Electricity only (after hardware) |
Break-even timeline:
- Light use (~$50/mo API): 2-3 years
- Medium use (~$200/mo API): 6-12 months
- Heavy use (~$500+/mo API): 3-6 months
Plus: No usage caps, no rate limits, no surprise bills.
Time investment:
- Initial setup: 1-2 hours with install wizard
- Ongoing maintenance: ~30 min/month (updates, monitoring)
- Model updates: Optional, 1-click when you want them
No paid support required for most users. Community Discord available.
Yes, 100%. Your prompts never leave your local network.
- No data sent to cloud providers
- No logging by third parties
- No training data contribution
- Full GDPR/HIPAA compliance capability
Yes. Common use cases:
- Legal document review
- Medical record analysis
- Financial data processing
- Internal company communications
- Client confidential work
Optional: Add Privacy Shield for automatic PII redaction as an extra layer.
- Models run in Docker containers (isolated)
- No outbound network required after initial download
- You control which models to run
- Can air-gap the server if needed
With install wizard: Under 1 hour for someone comfortable with terminal.
curl -fsSL https://raw.githubusercontent.com/Light-Heart-Labs/DreamServer/main/get-dream-server.sh | bashThe wizard:
- Detects your hardware
- Recommends configuration
- Downloads models
- Starts services
- Runs health checks
Options:
- Pre-configured hardware: We can ship ready-to-plug-in units
- Remote setup service: $200-500 depending on complexity
- Detailed guides: Step-by-step docs for common scenarios
./dream-cli updateThat's it. Updates are optional — you control when to apply them.
- This documentation
TROUBLESHOOTING.mdfor common issues- GitHub Issues: https://github.com/Light-Heart-Labs/DreamServer/issues
- Discord community (link in README)
| Feature | Dream Server | Ollama |
|---|---|---|
| Web UI | ✅ Built-in (Open WebUI) | ❌ Separate install |
| Voice | ✅ Full pipeline | ❌ Not included |
| RAG | ✅ Built-in | ❌ Not included |
| n8n workflows | ✅ Included | ❌ Not included |
| One-command setup | ✅ Yes | |
| Performance | ✅ llama-server (faster) |
Ollama is great for quick experiments. Dream Server is a complete production stack.
LocalAI is developer-focused. Dream Server is user-focused.
- LocalAI: More flexibility, more configuration needed
- Dream Server: Opinionated defaults, works out of box
See "Cost & ROI" section above. TL;DR: Local is cheaper at scale, more private, but requires hardware investment.
Built by Light Heart Labs / The Collective