LLM inference, chat UI, voice agents, workflow automation, RAG, image generation, and privacy tools — all running on your hardware. No cloud. No subscriptions. No configuration.
New here? Read the Friendly Guide or listen to the audio version — a complete walkthrough of what Dream Server is, how it works, and how to make it your own. No technical background needed.
Platform Support — March 2026
Platform Status Linux (NVIDIA + AMD) Supported — install and run today Windows (NVIDIA + AMD) Supported — install and run today macOS (Apple Silicon) Supported — install and run today Tested Linux distros: Ubuntu 24.04/22.04, Debian 12, Fedora 41+, Arch Linux, CachyOS, openSUSE Tumbleweed. Other distros using apt, dnf, pacman, or zypper should also work — open an issue if yours doesn't.
Windows: Requires Docker Desktop with WSL2 backend. NVIDIA GPUs use Docker GPU passthrough; AMD Strix Halo runs llama-server natively with Vulkan.
macOS: Requires Apple Silicon (M1+) and Docker Desktop. llama-server runs natively with Metal GPU acceleration; all other services run in Docker.
See the Support Matrix for details.
Setting up local AI usually means stitching together a dozen projects, debugging CUDA drivers, writing Docker configs, and hoping everything talks to each other. Dream Server replaces all of that with a single installer.
- Run one command — the installer detects your GPU, picks the right model for your hardware, generates secure credentials, and launches everything
- Chat in under 2 minutes — bootstrap mode starts a small model instantly while your full model downloads in the background
- 13 integrated services — chat, agents, voice, workflows, search, RAG, image generation, and more, all pre-wired and working together
- Fully moddable — drop in a folder, run
dream enable, done. Every service is an extension
curl -fsSL https://raw.githubusercontent.com/Light-Heart-Labs/DreamServer/main/dream-server/get-dream-server.sh | bashOpen http://localhost:3000 and start chatting.
No GPU? Dream Server also runs in cloud mode — same full stack, powered by OpenAI/Anthropic/Together APIs instead of local inference:
./install.sh --cloud
Port conflicts? Every port is configurable via environment variables. See
.env.examplefor the full list, or override at install time:WEBUI_PORT=9090 ./install.sh
Manual install (Linux)
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer/dream-server
./install.shWindows (PowerShell)
Requires Docker Desktop with WSL2 backend enabled. Install Docker Desktop first and make sure it is running before you start.
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer
.\install.ps1The installer detects your GPU, picks the right model, generates credentials, starts all services, and creates a Desktop shortcut to the Dashboard. Manage with .\dream-server\installers\windows\dream.ps1 status.
macOS (Apple Silicon)
Requires Apple Silicon (M1+) and Docker Desktop. Install Docker Desktop first and make sure it is running before you start.
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer/dream-server
./install.shThe installer detects your chip, picks the right model for your unified memory, launches llama-server natively with Metal acceleration, and starts all other services in Docker. Manage with ./dream-macos.sh status.
See the macOS Quickstart for details.
- Open WebUI — full-featured chat interface with conversation history, web search, document upload, and 30+ languages
- llama-server — high-performance LLM inference with continuous batching, auto-selected for your GPU
- LiteLLM — API gateway supporting local/cloud/hybrid modes
- Whisper — speech-to-text
- Kokoro — text-to-speech
- OpenClaw — autonomous AI agent framework
- n8n — workflow automation with 400+ integrations (Slack, email, databases, APIs)
- Qdrant — vector database for retrieval-augmented generation (RAG)
- SearXNG — self-hosted web search (no tracking)
- Perplexica — deep research engine
- ComfyUI — node-based image generation
- Privacy Shield — PII scrubbing proxy for API calls
- Dashboard — real-time GPU metrics, service health, model management
The installer detects your GPU and picks the optimal model automatically. No manual configuration.
| VRAM | Model | Example GPUs |
|---|---|---|
| 8–11 GB | Qwen 2.5 7B (Q4_K_M) | RTX 4060 Ti, RTX 3060 12GB |
| 12–20 GB | Qwen 2.5 14B (Q4_K_M) | RTX 3090, RTX 4080 |
| 20–40 GB | Qwen 2.5 32B (Q4_K_M) | RTX 4090, A6000 |
| 40+ GB | Qwen 2.5 72B (Q4_K_M) | A100, multi-GPU |
| 90+ GB | Qwen3 Coder Next 80B MoE | Multi-GPU A100/H100 |
| Unified RAM | Model | Hardware |
|---|---|---|
| 64–89 GB | Qwen3 30B-A3B (30B MoE) | Ryzen AI MAX+ 395 (64GB) |
| 90+ GB | Qwen3 Coder Next (80B MoE) | Ryzen AI MAX+ 395 (96GB) |
| Unified RAM | Model | Example Hardware |
|---|---|---|
| 8–24 GB | Qwen3 4B (Q4_K_M) | M1/M2 base, M4 Mac Mini (16GB) |
| 32 GB | Qwen3 8B (Q4_K_M) | M4 Pro Mac Mini, M3 Max MacBook Pro |
| 48 GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M4 Pro (48GB), M2 Max (48GB) |
| 64+ GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M2 Ultra Mac Studio, M4 Max (64GB+) |
Override tier selection: ./install.sh --tier 3
No waiting for large downloads. Dream Server uses bootstrap mode by default:
- Downloads a tiny 1.5B model in under a minute
- You start chatting immediately
- The full model downloads in the background
- Hot-swap to the full model when it's ready — zero downtime
The installer pulls all services in parallel. Downloads are resume-capable — interrupted downloads pick up where they left off.
Skip bootstrap: ./install.sh --no-bootstrap
The installer picks a model for your hardware, but you can switch anytime:
dream model current # What's running now?
dream model list # Show all available tiers
dream model swap T3 # Switch to a different tierIf the new model isn't downloaded yet, pre-fetch it first:
./scripts/pre-download.sh --tier 3 # Download before switching
dream model swap T3 # Then swap (restarts llama-server)Already have a GGUF you want to use? Drop it in data/models/, update GGUF_FILE and LLM_MODEL in .env, and restart:
docker compose restart llama-serverRollback is automatic — if a new model fails to load, Dream Server reverts to your previous model.
Dream Server is designed to be modded. Every service is an extension — a folder with a manifest.yaml and a compose.yaml. The dashboard, CLI, health checks, and compose stack all discover extensions automatically.
extensions/services/
my-service/
manifest.yaml # Metadata: name, port, health endpoint, GPU backends
compose.yaml # Docker Compose fragment (auto-merged into the stack)
dream enable my-service # Enable it
dream disable my-service # Disable it
dream list # See everythingThe installer itself is modular — 6 libraries and 13 phases, each in its own file. Want to add a hardware tier, swap a default model, or skip a phase? Edit one file.
Full extension guide | Installer architecture
The dream CLI manages your entire stack:
dream status # Health checks + GPU status
dream list # All services and their state
dream logs llm # Tail logs (aliases: llm, stt, tts)
dream restart [service] # Restart one or all services
dream start / stop # Start or stop the stack
dream mode cloud # Switch to cloud APIs via LiteLLM
dream mode local # Switch back to local inference
dream mode hybrid # Local primary, cloud fallback
dream model swap T3 # Switch to a different hardware tier
dream enable n8n # Enable an extension
dream disable whisper # Disable one
dream config show # View .env (secrets masked)
dream preset save gaming # Snapshot current config
dream preset load gaming # Restore it| Dream Server | Ollama + Open WebUI | LocalAI | |
|---|---|---|---|
| One-command full-stack install | LLM + agents + workflows + RAG + voice + images | LLM + chat only | LLM only |
| Hardware auto-detect + model selection | NVIDIA + AMD Strix Halo | No | No |
| AMD APU unified memory support | ROCm + llama-server | Partial (Vulkan) | No |
| Autonomous AI agents | OpenClaw | No | No |
| Workflow automation | n8n (400+ integrations) | No | No |
| Voice (STT + TTS) | Whisper + Kokoro | No | No |
| Image generation | ComfyUI | No | No |
| RAG pipeline | Qdrant + embeddings | No | No |
| Extension system | Manifest-based, hot-pluggable | No | No |
| Multi-GPU | Yes (NVIDIA) | Partial | Partial |
| Quickstart | Step-by-step install guide with troubleshooting |
| Hardware Guide | What to buy, tier recommendations |
| FAQ | Common questions and configuration |
| Extensions | How to add custom services |
| Installer Architecture | Modular installer deep dive |
| Changelog | Version history and release notes |
| Contributing | How to contribute |
Dream Server exists because of the incredible people, projects, and communities that make open-source AI possible. We are grateful to every contributor, maintainer, and tinkerer whose work powers this stack.
Thanks to kyuz0 for amd-strix-halo-toolboxes — pre-built ROCm containers for Strix Halo that saved us a lot of pain from having to build our own. And to lhl for strix-halo-testing — the foundational Strix Halo AI research and rocWMMA performance work that the broader community builds on.
- llama.cpp (ggerganov) — LLM inference engine
- Qwen (Alibaba Cloud) — Default language models
- Open WebUI — Chat interface
- ComfyUI — Image generation engine
- FLUX.1 (Black Forest Labs) — Image generation model
- AMD ROCm — GPU compute platform
- AMD Strix Halo Toolboxes (kyuz0) — Pre-built ROCm containers for AMD inference
- Strix Halo Testing (lhl) — Foundational Strix Halo AI research and rocWMMA optimizations
- n8n — Workflow automation
- Qdrant — Vector database
- SearXNG — Privacy-respecting search
- Perplexica — AI-powered search
- LiteLLM — LLM API gateway
- Kokoro FastAPI (remsky) — Text-to-speech
- Speaches — Speech-to-text
- Strix Halo Home Lab — Community knowledge base
- Yasin Bursali (yasinBursali) — Fixed CI workflow discovery, added dashboard-api router test coverage with security-focused tests (auth enforcement, path traversal protection), documented all 14 undocumented extension services, fixed macOS disk space preflight to check the correct volume for external drive installs, moved embeddings platform override to prevent orphaned service errors when RAG is disabled, fixed macOS portability issues restoring broken Apple Silicon Neural Engine detection (GNU date/grep to POSIX), fixed docker compose failure diagnostic unreachable under pipefail, added stderr warning on manifest parse failure in compose resolver, fixed socket FD leak in dashboard-api, and added open-webui health gate to prevent 502 errors during model warmup
- latentcollapse (Matt C) — Security audit and hardening: OpenClaw localhost binding fix, multi-GPU VRAM detection, AMD dashboard hardening, and the Agent Policy Engine (APE) extension
- Igor Lins e Silva (igorls) — Stability audit fixing 9 infrastructure bugs: dynamic compose discovery in backup/restore/update scripts, Token Spy persistent storage and connection pool hardening, dotglob rollback fix, and systemd auto-resume service correction
- Nino Skopac (NinoSkopac) — Token Spy dashboard improvements: shared metric normalization with parity tests, budget and active session tracking, configurable secure CORS replacing wildcard origins, and DB backend compatibility shim for sidecar migration
- Glexy (fullstackdev0110) — Fixed dream-cli chat port initialization bug, hardened validate.sh environment variable handling with safer quoting and .env parsing, removed all
evalusage from installer/preflight env parsing and added a safe-env loader (lib/safe-env.sh) to prevent shell injection - bugman-007 — Parallelized health checks in dream status for 5–10× speedup using async gather with proper timeout handling, benchmark and test scripts, integrated backup/restore commands into dream-cli, and added preset import/export with path traversal protection and archive validation
- norfrt6-lab — Replaced 12+ silent exception-swallowing patterns with specific exception types and proper logging, added cross-platform system metrics (macOS/Windows) for uptime, CPU, and RAM, plus Apple Silicon GPU detection via sysctl/vm_stat
If we missed anyone, open an issue. We want to get this right.
Apache 2.0 — Use it, modify it, ship it. See LICENSE.
Built by Light Heart Labs and The Collective


