Skip to content

TheProxyCompany/orchard-py

Repository files navigation

Orchard

Python client for high-performance LLM inference on Apple Silicon.

Installation

pip install orchard

Usage

from orchard import Client

client = Client()

response = client.chat(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.text)

Streaming

for delta in client.chat(model="...", messages=[...], stream=True):
    print(delta.content, end="", flush=True)

Batch Inference

responses = client.chat_batch(
    model="...",
    conversations=[
        [{"role": "user", "content": "Question 1"}],
        [{"role": "user", "content": "Question 2"}],
    ],
)

Model Profiles

Chat templates and control tokens are loaded from the Pantheon submodule at orchard/formatter/profiles/. This provides a single source of truth shared across all Orchard SDKs (Python, Rust, Swift). See that repo for the list of supported model families.

Requirements

  • Python 3.10+
  • macOS 14+ (Apple Silicon)
  • PIE (Proxy Inference Engine)

Related

License

Apache-2.0

About

Python Client for Orchard

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages