Lemonade helps users discover and run local AI apps by serving optimized LLMs, images, and speech right from their own GPUs and NPUs.
Apps like n8n, VS Code Copilot, Morphik, and many more use Lemonade to seamlessly run generative AI on any PC.
- Install: Windows · Linux · Docker · Source
- Get Models: Browse and download with the Model Manager
- Generate: Try models with the built-in interfaces for chat, image gen, speech gen, and more
- Mobile: Take your lemonade to go: iOS · Android · Source
- Connect: Use Lemonade with your favorite apps:
View all apps →
Want your app featured here? Just submit a marketplace PR!
To run and chat with Gemma 3:
lemonade-server run Gemma-3-4b-it-GGUF
More modalities:
# image gen
lemonade-server run SDXL-Turbo
# speech gen
lemonade-server run kokoro-v1
# transcription
lemonade-server run Whisper-Large-v3-Turbo
To see models availables and download them:
lemonade-server list
lemonade-server pull Gemma-3-4b-it-GGUF
To see the backends available on your PC:
lemonade-server recipes
Lemonade supports a wide variety of LLMs (GGUF, FLM, and ONNX), whisper, stable diffusion, etc. models across CPU, GPU, and NPU.
Use lemonade-server pull or the built-in Model Manager to download models. You can also import custom GGUF/ONNX models from Hugging Face.
Lemonade supports multiple recipes (LLM, speech, TTS, and image generation), and each recipe has its own backend and hardware requirements.
| Modality | Recipe | Backend | Device | OS |
|---|---|---|---|---|
| LLM | llamacpp |
vulkan |
GPU | Windows, Linux |
rocm |
Select AMD GPUs* | Windows, Linux | ||
cpu |
x86_64 |
Windows, Linux | ||
flm |
npu |
XDNA2 NPU | Windows | |
ryzenai-llm |
npu |
XDNA2 NPU | Windows | |
| Speech-to-text | whispercpp |
npu |
XDNA2 NPU | Windows |
cpu |
x86_64 |
Windows | ||
| Text-to-speech | kokoro |
cpu |
x86_64 |
Windows, Linux |
| Image generation | sd-cpp |
rocm |
Selected AMD GPUs | Windows, Linux |
cpu |
x86_64 CPU |
Windows, Linux |
To check exactly which recipes/backends are supported on your own machine, run:
lemonade-server recipes
* See supported AMD ROCm platforms
| Architecture | Platform Support | GPU Models |
|---|---|---|
| gfx1151 (STX Halo) | Windows, Ubuntu | Ryzen AI MAX+ Pro 395 |
| gfx120X (RDNA4) | Windows, Ubuntu | Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT |
| gfx110X (RDNA3) | Windows, Ubuntu | Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT |
| Under Development | Under Consideration | Recently Completed |
|---|---|---|
| macOS | vLLM support | Image generation |
| MLX support | Enhanced custom model usage | Speech-to-text |
| More whisper.cpp backends | Text-to-speech | |
| More SD.cpp backends | Apps marketplace | |
You can use any OpenAI-compatible client library by configuring it to use http://localhost:8000/api/v1 as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.
Feel free to pick and choose your preferred language.
| Python | C++ | Java | C# | Node.js | Go | Ruby | Rust | PHP |
|---|---|---|---|---|---|---|---|---|
| openai-python | openai-cpp | openai-java | openai-dotnet | openai-node | go-openai | ruby-openai | async-openai | openai-php |
from openai import OpenAI
# Initialize the client to use Lemonade Server
client = OpenAI(
base_url="http://localhost:8000/api/v1",
api_key="lemonade" # required but unused
)
# Create a chat completion
completion = client.chat.completions.create(
model="Llama-3.2-1B-Instruct-Hybrid", # or any other available model
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
# Print the response
print(completion.choices[0].message.content)For more detailed integration instructions, see the Integration Guide.
To read our frequently asked questions, see our FAQ Guide
We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our contribution guide.
New contributors can find beginner-friendly issues tagged with "Good First Issue" to get started.
This is a community project maintained by @amd-pworfolk @bitgamma @danielholanda @jeremyfowers @Geramy @ramkrishna2910 @siavashhub @sofiageo @superm1 @vgodsoe, and sponsored by AMD. You can reach us by filing an issue, emailing lemonade@amd.com, or joining our Discord.
Free code signing provided by SignPath.io, certificate by SignPath Foundation.
- Committers and reviewers: Maintainers of this repo
- Approvers: Owners
Privacy policy: This program will not transfer any information to other networked systems unless specifically requested by the user or the person installing or operating it. When the user requests it, Lemonade downloads AI models from Hugging Face Hub (see their privacy policy).
This project is:
- Built with C++ (server) and React (app) with ❤️ for the open source community,
- Standing on the shoulders of great tools from:
- Accelerated by mentorship from the OCV Catalyst program.
- Licensed under the Apache 2.0 License.
- Portions of the project are licensed as described in NOTICE.md.











