Zero dependencies. Maximum performance. One unified API.
Installation • Quick Start • Providers • Features • Docs
| Feature | ArcLLM | Others |
|---|---|---|
| Dependencies | 0 (stdlib only) | 10-50+ packages |
| Install size | ~100KB | 50-200MB |
| Cold start | ~10ms | 500ms-2s |
| API | OpenAI-compatible | Varies |
ArcLLM is built for developers who want speed, simplicity, and reliability when working with LLMs.
pip install arcllmThat's it. No dependency hell. No version conflicts. Just works.
import arcllm
# Simple completion
response = arcllm.completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)stream = arcllm.completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a haiku about coding"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)response = await arcllm.acompletion(
model="anthropic/claude-3-5-sonnet-latest",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)# OpenAI
response = arcllm.completion(model="gpt-4o", messages=messages)
# Anthropic
response = arcllm.completion(model="anthropic/claude-3-5-sonnet-latest", messages=messages)
# Google Gemini
response = arcllm.completion(model="gemini/gemini-1.5-pro", messages=messages)
# Groq (ultra-fast inference)
response = arcllm.completion(model="groq/llama-3.3-70b-versatile", messages=messages)
# Local with Ollama
response = arcllm.completion(model="ollama/llama3.2", messages=messages)| Provider | Prefix | Models | Environment Variable |
|---|---|---|---|
| OpenAI | openai/ |
GPT-4o, GPT-4, o1, o3 | OPENAI_API_KEY |
| Anthropic | anthropic/ |
Claude 3.5, Claude 3 | ANTHROPIC_API_KEY |
| Google Gemini | gemini/ |
Gemini 1.5, Gemini 2.0 | GEMINI_API_KEY |
| Azure OpenAI | azure/ |
GPT-4o, GPT-4 | AZURE_OPENAI_API_KEY |
| AWS Bedrock | bedrock/ |
Claude, Llama, Titan | AWS credentials |
| Google Vertex | vertex_ai/ |
Gemini, PaLM | GOOGLE_ACCESS_TOKEN |
| Mistral | mistral/ |
Mistral Large, Codestral | MISTRAL_API_KEY |
| Groq | groq/ |
Llama 3.3, Mixtral | GROQ_API_KEY |
| Together AI | together_ai/ |
Llama, Mixtral, Qwen | TOGETHER_API_KEY |
| Fireworks | fireworks_ai/ |
Llama, Mixtral | FIREWORKS_API_KEY |
| DeepSeek | deepseek/ |
DeepSeek V3, Coder | DEEPSEEK_API_KEY |
| Perplexity | perplexity/ |
Sonar, Online | PERPLEXITY_API_KEY |
| Cohere | cohere/ |
Command R+ | COHERE_API_KEY |
| Databricks | databricks/ |
DBRX, Llama | DATABRICKS_TOKEN |
| Ollama | ollama/ |
Any local model | (local) |
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}]
response = arcllm.completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools
)
if response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
print(f"Call: {tool_call.function.name}({tool_call.function.arguments})")response = arcllm.completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Generate a user profile"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "user_profile",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"interests": {"type": "array", "items": {"type": "string"}}
},
"required": ["name", "age"]
}
}
}
)response = arcllm.completion(
model="gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
)response = arcllm.embedding(
model="text-embedding-3-small",
input=["Hello world", "Goodbye world"]
)
print(f"Dimensions: {len(response.data[0].embedding)}")response = arcllm.completion(model="gpt-4o", messages=messages)
# Calculate cost
cost = arcllm.completion_cost(response)
print(f"Cost: ${cost:.6f}")
# Or get per-token pricing
input_cost, output_cost = arcllm.cost_per_token(
model="gpt-4o",
prompt_tokens=1000,
completion_tokens=500
)# Check what models can do
arcllm.supports_vision("gpt-4o") # True
arcllm.supports_vision("gpt-3.5-turbo") # False
arcllm.supports_tools("claude-3-5-sonnet-latest") # True
arcllm.supports_pdf_input("gemini-1.5-pro") # True
arcllm.get_max_tokens("gpt-4o") # 16384from arcllm import (
ArcLLMError,
AuthenticationError,
RateLimitError,
TimeoutError,
)
try:
response = arcllm.completion(model="gpt-4o", messages=messages)
except AuthenticationError:
print("Check your API key")
except RateLimitError as e:
print(f"Rate limited. Retry after {e.retry_after}s")
except TimeoutError:
print("Request timed out")
except ArcLLMError as e:
print(f"Error: {e.message}")# Per-request configuration
response = arcllm.completion(
model="gpt-4o",
messages=messages,
api_key="sk-...", # Override API key
api_base="https://...", # Custom endpoint
timeout=120.0, # Request timeout
max_retries=5, # Retry count
)
# Azure OpenAI
response = arcllm.completion(
model="azure/my-deployment",
messages=messages,
api_base="https://myresource.openai.azure.com",
api_version="2024-10-21",
)ArcLLM is designed as a drop-in replacement:
# Before
import litellm
response = litellm.completion(model="gpt-4o", messages=messages)
# After
import arcllm
response = arcllm.completion(model="gpt-4o", messages=messages)
# Or alias it
import arcllm as litellm
response = litellm.completion(model="gpt-4o", messages=messages)An arc is the shortest path between two points. ArcLLM is the shortest path between your code and any LLM provider—minimal, direct, efficient.
Apache 2.0 - see LICENSE
Built with ❤️ for developers who value simplicity