docs: rewrite README and docs to lead with OpenAI API compatibility#5323
docs: rewrite README and docs to lead with OpenAI API compatibility#5323leseb merged 9 commits intollamastack:mainfrom
Conversation
Rewrite the README, docs landing page, API overview, and OpenAI compatibility page to reflect the current state of the project. The project has evolved from a "standardized Gen AI API" to an OpenAI-compatible API server with pluggable providers. The new messaging leads with what users care about: drop-in compatibility with the OpenAI API, any model, any infrastructure. Key changes: - README leads with "OpenAI-compatible API server" and a code snippet - APIs are described by their actual endpoints, not internal categories - Responses API (agentic orchestration, MCP, file_search) is featured - Provider architecture shown as local-to-production concept, not a table - Open Responses conformance mentioned - OpenAI compat page trimmed from 230 lines of filler to focused content - API overview page now lists actual endpoints - pyproject.toml description updated for PyPI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
|
@raghotham @franciscojavierarceo @cdoern @mattf as discussed in today's community call. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
| - OpenAI API compatibility | ||
| - Cloud-based execution | ||
| - Scalable infrastructure | ||
| ## Implemented endpoints |
There was a problem hiding this comment.
We should also mention the non-OpenAI APIs that are API adjacent like Prompts and File Processor. Eventually we would add Memory to that list too.
docs/docs/index.mdx
Outdated
| # Welcome to Llama Stack | ||
|
|
||
| Llama Stack is the open-source framework for building generative AI applications. | ||
| **Open-source API server for building AI applications. OpenAI-compatible. Any model, any infrastructure.** |
There was a problem hiding this comment.
| **Open-source API server for building AI applications. OpenAI-compatible. Any model, any infrastructure.** | |
| **Open-source Agentic API server for building AI applications. OpenAI-compatible. Any model, any infrastructure.** |
docs/docs/api-openai/index.mdx
Outdated
| | Embeddings | `/v1/embeddings` | Text embeddings | | ||
| | Models | `/v1/models` | Model listing and management | | ||
| | Files | `/v1/files` | File upload and management | | ||
| | Vector Stores | `/v1/vector_stores` | Document storage and semantic search | |
There was a problem hiding this comment.
actually we support more than just semantic search
| - **Safety** — content moderation via Llama Guard | ||
| - **[Open Responses](https://www.openresponses.org/) conformant** — the Responses API implementation passes the Open Responses conformance test suite | ||
|
|
||
| ## Use any model, use any infrastructure |
There was a problem hiding this comment.
can we outline RAG in this diagram a little more? Right now it only shows inference provider plugin
franciscojavierarceo
left a comment
There was a problem hiding this comment.
some small suggestions but otherwise lgtm
| **Open-source API server for building AI applications. OpenAI-compatible. Any model, any infrastructure.** | ||
|
|
||
| ## Overview | ||
| Llama Stack is a drop-in replacement for the OpenAI API that you can run anywhere — your laptop, your datacenter, or the cloud. Use any OpenAI-compatible client or agentic framework. Swap between Llama, GPT, Gemini, Mistral, or any model without changing your application code. |
| from openai import OpenAI | ||
|
|
||
| client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake") | ||
| response = client.chat.completions.create( | ||
| model="llama-3.3-70b", | ||
| messages=[{"role": "user", "content": "Hello"}], | ||
| ) |
There was a problem hiding this comment.
Here are more demos for openai responses api with openai client opendatahub-io/llama-stack-demos#324
README.md
Outdated
| - **Responses API** — server-side agentic orchestration with tool calling, MCP server integration, and built-in file search (RAG) in a single API call ([learn more](https://llamastack.github.io/docs/api-openai)) | ||
| - **Vector Stores & Files** — `/v1/vector_stores` and `/v1/files` for managed document storage and retrieval | ||
| - **Batches** — `/v1/batches` for offline batch processing | ||
| - **Safety** — content moderation via Llama Guard |
- Add "agentic" to tagline per franciscojavierarceo suggestion - Remove Safety/Moderations (being removed in llamastack#5291) - Use uv instead of pip in install instructions - Remove Swift and Kotlin from SDK table - Fix "semantic search" to just "search" for vector stores - Mention non-OpenAI APIs (Prompts, File Processors) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Replace the flow diagram with a server architecture view that shows the API endpoints alongside both inference and vector store providers. This addresses the feedback that RAG/vector stores were missing from the diagram. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Add a third column showing tools & connectors (MCP servers, web search, file search/RAG) and file storage (local filesystem, S3). Add /v1/connectors to the API endpoints row. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Remind agents to update the ASCII architecture diagram in README.md when adding or removing providers, APIs, or backend integrations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
|
All comments addressed, merging now, thanks! |
Summary
The project's public-facing docs haven't kept up with how the project has evolved. The README opens with "standardizes the core building blocks" — a year-old message that could describe any AI framework. The word "OpenAI" appears zero times. The Responses API is invisible.
This PR rewrites the README, docs landing page, API overview, and OpenAI compatibility page to reflect what the project actually is today: an OpenAI-compatible API server with pluggable providers.
What changed
README.md — full rewrite
base_urlswap with the OpenAI client/v1/chat/completions,/v1/responses, etc.)docs/docs/index.mdx — same messaging applied to the docs landing page
docs/docs/api-openai/index.mdx — trimmed from 230 lines to ~80
docs/docs/api-overview.md — now lists actual endpoints
pyproject.toml — description updated from "Llama Stack" to a real PyPI description
What was removed
Test plan
🤖 Generated with Claude Code