Python client for high-performance LLM inference on Apple Silicon.
pip install orchardfrom orchard import Client
client = Client()
response = client.chat(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.text)for delta in client.chat(model="...", messages=[...], stream=True):
print(delta.content, end="", flush=True)responses = client.chat_batch(
model="...",
conversations=[
[{"role": "user", "content": "Question 1"}],
[{"role": "user", "content": "Question 2"}],
],
)Chat templates and control tokens are loaded from the Pantheon submodule at orchard/formatter/profiles/. This provides a single source of truth shared across all Orchard SDKs (Python, Rust, Swift). See that repo for the list of supported model families.
- Python 3.10+
- macOS 14+ (Apple Silicon)
- PIE (Proxy Inference Engine)
- orchard-rs - Rust client
- orchard-swift - Swift client
- Pantheon - Model profiles
Apache-2.0