Skip to content

nuvolos-cloud/cascadeflow-openai-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cascadeflow OpenAI Proxy

This project provides a FastAPI-based proxy server that exposes an OpenAI-compatible API (/v1/chat/completions) for the cascadeflow library. This allows you to use cascadeflow with tools and extensions that support the OpenAI API format, such as the Continue VSCode extension.

Features

  • OpenAI Compatibility: Implements the /v1/chat/completions endpoint, accepting standard OpenAI chat completion requests.
  • Streaming Support: Fully supports streaming responses via Server-Sent Events (SSE).
  • Configurable Models: Define your models, providers, and costs in a simple YAML configuration file.
  • Cascadeflow Integration: Leverages the CascadeAgent to orchestrate model interactions.

Prerequisites

  • Python 3.8+
  • cascadeflow library installed (included in requirements)

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd cascadeflow-openai-proxy
  2. Install dependencies: It is recommended to use a virtual environment.

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt

Configuration

The server will start on http://0.0.0.0:8000.

Using the Startup Script

For convenience, a script is provided to start multiple Ollama instances and the proxy together. This is useful if you want to serve different models on different ports.

./start_services.sh

This script will:

  1. Start an Ollama instance on port 11434 (default).
  2. Start a second Ollama instance on port 11435.
  3. Start the cascadeflow-openai-proxy.

Configuration

1. Model Configuration (config.yaml)

The config.yaml file defines the models available to the proxy. You can configure multiple models with different providers and specific URLs.

Example config.yaml:

models:
  - name: qwen3:1.7b
    provider: ollama
    url: http://localhost:11434
    cost: 0.0
  - name: ministral-3:8b
    provider: ollama
    url: http://localhost:11435
    cost: 0.0
  - name: claude-3-5-sonnet-20241022
    provider: anthropic
    cost: 0.003
  • name: The model name to be used in API requests.
  • provider: The provider name (e.g., openai, anthropic, ollama).
  • url: (Optional) The base URL for the provider (useful for local models like Ollama).
  • cost: (Optional) Cost per token or request.

Connecting with Clients

You can now point any OpenAI-compatible client to your proxy.

Base URL: http://localhost:8000/v1

Example: cURL

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Example: Continue (VSCode Extension)

To use this with Continue, add the following to your config.json:

{
  "models": [
    {
      "title": "Cascadeflow Proxy",
      "provider": "openai",
      "model": "gpt-4o-mini",
      "apiBase": "http://localhost:8000/v1",
      "apiKey": "EMPTY" 
    }
  ]
}

Note: The apiKey field is required by some clients but ignored by the proxy if not needed, as the proxy uses the server-side environment variables.

API Endpoints

  • POST /v1/chat/completions: Handles chat completion requests. Supports both streaming (stream=True) and non-streaming modes.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published