Llama Stack

Quick Start | Documentation | OpenAI API Compatibility | Discord

Open-source agentic API server for building AI applications. OpenAI-compatible. Any model, any infrastructure.

Llama Stack is a drop-in replacement for the OpenAI API that you can run anywhere — your laptop, your datacenter, or the cloud. Use any OpenAI-compatible client or agentic framework. Swap between Llama, GPT, Gemini, Mistral, or any model without changing your application code.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)

What you get

Chat Completions & Embeddings — standard /v1/chat/completions, /v1/completions, and /v1/embeddings endpoints, compatible with any OpenAI client
Responses API — server-side agentic orchestration with tool calling, MCP server integration, and built-in file search (RAG) in a single API call (learn more)
Vector Stores & Files — /v1/vector_stores and /v1/files for managed document storage and search
Batches — /v1/batches for offline batch processing
Open Responses conformant — the Responses API implementation passes the Open Responses conformance test suite

Use any model, use any infrastructure

Llama Stack has a pluggable provider architecture. Develop locally with Ollama, deploy to production with vLLM, or connect to a managed service — the API stays the same.

┌─────────────────────────────────────────────────────────────────────────┐
│                          Llama Stack Server                             │
│               (same API, same code, any environment)                    │
│                                                                         │
│  /v1/chat/completions  /v1/responses  /v1/vector_stores  /v1/files      │
│  /v1/embeddings        /v1/batches    /v1/models         /v1/connectors │
├───────────────────┬──────────────────┬──────────────────────────────────┤
│  Inference        │  Vector stores   │  Tools & connectors              │
│    Ollama         │    FAISS         │    MCP servers                   │
│    vLLM, TGI      │    Milvus        │    Brave, Tavily (web search)    │
│    AWS Bedrock    │    Qdrant        │    File search (built-in RAG)    │
│    Azure OpenAI   │    PGVector      │                                  │
│    Fireworks      │    ChromaDB      │  File storage & processing       │
│    Together       │    Weaviate      │    Local filesystem, S3          │
│    ...15+ more    │    Elasticsearch │    PDF, HTML (file processors)   │
│                   │    SQLite-vec    │                                  │
└───────────────────┴──────────────────┴──────────────────────────────────┘

See the provider documentation for the full list.

Get started

Install and run a Llama Stack server:

# One-line install
curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash

# Or install via uv
uv pip install llama-stack

# Start the server (uses the starter distribution with Ollama)
llama stack run

Then connect with any OpenAI client — Python, TypeScript, curl, or any framework that speaks the OpenAI API.

See the Quick Start guide for detailed setup.

Resources

Documentation — full reference
OpenAI API Compatibility — endpoint coverage and provider matrix
Getting Started Notebook — text and vision inference walkthrough
Contributing — how to contribute

Client SDKs:

Language	SDK	Package
Python	llama-stack-client-python
TypeScript	llama-stack-client-typescript

Community

We hold regular community calls every Thursday at 09:00 AM PST — see the Community Event on Discord for details.

Thanks to all our amazing contributors!

Name		Name	Last commit message	Last commit date
Latest commit History 3,701 Commits
.github		.github
benchmarking		benchmarking
client-sdks		client-sdks
containers		containers
docs		docs
scripts		scripts
src		src
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
RELEASE_PROCESS.md		RELEASE_PROCESS.md
SECURITY.md		SECURITY.md
conftest.py		conftest.py
coverage.svg		coverage.svg
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama Stack

What you get

Use any model, use any infrastructure

Get started

Resources

Community

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Llama Stack

What you get

Use any model, use any infrastructure

Get started

Resources

Community

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages