A robust, production-grade Python wrapper for the Lemonade C++ Backend.
This SDK provides a clean, pythonic interface for interacting with local LLMs running on Lemonade. It was built to power Sorana (a visual workspace for AI), extracting the core integration logic into a standalone, open-source library for the developer community.
- Auto-Discovery: Automatically scans multiple ports and hosts to find active Lemonade instances.
- Low-Overhead Architecture: Designed as a thin, efficient wrapper to leverage Lemonade's C++ performance with minimal Python latency.
- Health Checks & Recovery: Built-in utilities to verify server status and handle connection drops.
- Type-Safe Client: Full Python type hinting for better developer experience (IDE autocompletion).
- Model Management: Simple API to load, unload, and list models dynamically.
pip install .Alternatively, you can install it directly from GitHub:
pip install git+[https://github.com/Tetramatrix/lemonade-python-sdk.git](https://github.com/Tetramatrix/lemonade-python-sdk.git)The SDK automatically handles port discovery, so you don't need to hardcode localhost:8000.
from lemonade_integration.client import LemonadeClient
from lemonade_integration.port_scanner import find_available_lemonade_port
# Auto-discover running instance
port = find_available_lemonade_port()
if port:
client = LemonadeClient(base_url=f"http://localhost:{port}")
if client.health_check():
print(f"Connected to Lemonade on port {port}")
else:
print("No Lemonade instance found.")response = client.chat_completion(
model="Llama-3-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Hello World in C++"}
],
temperature=0.7
)
print(response['choices'][0]['message']['content'])# List all available models
models = client.list_models()
for m in models:
print(f"Found model: {m['id']}")
# Load a specific model into memory
client.load_model("Mistral-7B-v0.1")This SDK was extracted from the core engine of Sorana, a professional visual workspace for AI. It demonstrates the SDK's capability to handle complex, real-world requirements on AMD Ryzen AI hardware:
- Low Latency: Powers sub-second response times for multi-model chat interfaces.
- Dynamic Workflows: Manages the loading and unloading of 20+ different LLMs based on user activity to optimize local NPU/GPU memory.
- Zero-Config UX: Uses the built-in port scanner to automatically connect the Sorana frontend to the Lemonade backend without user intervention.
- client.py: Main entry point for API interactions.
- port_scanner.py: Utilities for detecting Lemonade instances across ports (8000-9000).
- model_discovery.py: Logic for fetching and parsing model metadata.
- request_builder.py: Helper functions to construct compliant payloads.
Contributions are welcome! This project is intended to help the AMD Ryzen AI and Lemonade community build downstream applications faster.
This project is licensed under the MIT License - see the LICENSE file for details.