This project provides a FastAPI-based proxy server that exposes an OpenAI-compatible API (/v1/chat/completions) for the cascadeflow library. This allows you to use cascadeflow with tools and extensions that support the OpenAI API format, such as the Continue VSCode extension.
- OpenAI Compatibility: Implements the
/v1/chat/completionsendpoint, accepting standard OpenAI chat completion requests. - Streaming Support: Fully supports streaming responses via Server-Sent Events (SSE).
- Configurable Models: Define your models, providers, and costs in a simple YAML configuration file.
- Cascadeflow Integration: Leverages the
CascadeAgentto orchestrate model interactions.
- Python 3.8+
cascadeflowlibrary installed (included in requirements)
-
Clone the repository:
git clone <repository-url> cd cascadeflow-openai-proxy
-
Install dependencies: It is recommended to use a virtual environment.
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
The server will start on http://0.0.0.0:8000.
For convenience, a script is provided to start multiple Ollama instances and the proxy together. This is useful if you want to serve different models on different ports.
./start_services.shThis script will:
- Start an Ollama instance on port 11434 (default).
- Start a second Ollama instance on port 11435.
- Start the
cascadeflow-openai-proxy.
The config.yaml file defines the models available to the proxy. You can configure multiple models with different providers and specific URLs.
Example config.yaml:
models:
- name: qwen3:1.7b
provider: ollama
url: http://localhost:11434
cost: 0.0
- name: ministral-3:8b
provider: ollama
url: http://localhost:11435
cost: 0.0
- name: claude-3-5-sonnet-20241022
provider: anthropic
cost: 0.003name: The model name to be used in API requests.provider: The provider name (e.g.,openai,anthropic,ollama).url: (Optional) The base URL for the provider (useful for local models like Ollama).cost: (Optional) Cost per token or request.
You can now point any OpenAI-compatible client to your proxy.
Base URL: http://localhost:8000/v1
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'To use this with Continue, add the following to your config.json:
{
"models": [
{
"title": "Cascadeflow Proxy",
"provider": "openai",
"model": "gpt-4o-mini",
"apiBase": "http://localhost:8000/v1",
"apiKey": "EMPTY"
}
]
}Note: The apiKey field is required by some clients but ignored by the proxy if not needed, as the proxy uses the server-side environment variables.
POST /v1/chat/completions: Handles chat completion requests. Supports both streaming (stream=True) and non-streaming modes.