Vast PyWorker Examples

This repository contains example PyWorkers used by Vast.ai’s default Serverless templates (e.g., vLLM, TGI, ComfyUI, Wan, ACE). A PyWorker is a lightweight Python HTTP proxy that runs alongside your model server and:

Exposes one or more HTTP routes (e.g., /v1/completions, /generate/sync)
Optionally validates/transforms request payloads
Computes per-request workload for autoscaling
Forwards requests to the local model server
Optionally supports FIFO queueing when the backend cannot process concurrent requests
Detects readiness/failure from model logs and runs a benchmark to estimate throughput

Important: The core PyWorker framework (Worker, WorkerConfig, HandlerConfig, BenchmarkConfig, LogActionConfig) is provided by the vastai / vastai-sdk Python package (https://github.com/vast-ai/vast-sdk). This repo focuses on worker implementations and examples, not the framework internals.

Repository Purpose

Use this repository as:

A reference for how Vast templates wire up worker.py
A starting point for implementing your own custom Serverless PyWorker
A collection of working examples for common model backends

If you are looking for the framework code itself, refer to the Vast.ai SDK.

Project Structure

Typical layout:

workers/
- Example worker implementations (each worker is usually a self-contained folder)
- Each example typically includes:
  - worker.py (the entrypoint used by Serverless)
  - Optional sample workflows / payloads (for ComfyUI-based workers)
  - Optional local test harness scripts

How Serverless launches worker.py

On each worker instance, the template’s startup script typically:

Clones your repository from PYWORKER_REPO
Installs dependencies from requirements.txt
Starts the model server (vLLM, TGI, ComfyUI, etc.)
Runs:
```
python worker.py
```

Your worker.py builds a WorkerConfig, constructs a Worker, and starts the PyWorker HTTP server.

worker.py

A PyWorker is usually a single worker.py that uses SDK configuration objects:

from vastai import (
    Worker,
    WorkerConfig,
    HandlerConfig,
    BenchmarkConfig,
    LogActionConfig,
)

worker_config = WorkerConfig(
    model_server_url="http://127.0.0.1",
    model_server_port=18000,
    model_log_file="/var/log/model/server.log",
    handlers=[
        HandlerConfig(
            route="/v1/completions",
            allow_parallel_requests=True,
            max_queue_time=60.0,
            workload_calculator=lambda payload: float(payload.get("max_tokens", 0)),
            benchmark_config=BenchmarkConfig(
                generator=lambda: {"prompt": "hello", "max_tokens": 128},
                runs=8,
                concurrency=10,
            ),
        )
    ],
    log_action_config=LogActionConfig(
        on_load=["Application startup complete."],
        on_error=["Traceback (most recent call last):", "RuntimeError:"],
        on_info=['"message":"Download'],
    ),
)

Worker(worker_config).run()

Included Examples

This repository contains example PyWorkers corresponding to common Vast templates, including:

vLLM: OpenAI-compatible completions/chat endpoints with parallel request support
TGI (Text Generation Inference): OpenAI-compatible endpoints and log-based readiness
ComfyUI (Image / JSON workflows): /generate/sync for ComfyUI workflow execution
ComfyUI Wan 2.2 (T2V): ComfyUI workflow execution producing video outputs
ComfyUI ACE Step (Text-to-Music): ComfyUI workflow execution producing audio outputs

Exact worker paths and naming may vary by template; use the workers/ directory as the source of truth.

Getting Started (Local)

Install Python dependencies for the examples you plan to run:
```
pip install -r requirements.txt
```
Start your model server locally (vLLM, TGI, ComfyUI, etc.) and ensure:
- You know the model server URL/port
- You have a log file path you can tail for readiness/error detection
Run the worker:
```
python worker.py
```
or, if running an example from a subfolder:
```
python workers/<example>/worker.py
```

Note: Many examples assume they are running inside Vast templates (ports, log paths, model locations). You may need to adjust model_server_port and model_log_file for local usage.

Deploying on Vast Serverless

To use a custom PyWorker with Serverless:

Create a public Git repository containing:
- worker.py
- requirements.txt
In your Serverless template / endpoint configuration, set:
- PYWORKER_REPO to your Git repository URL
- (Optional) PYWORKER_REF to a git ref (branch, tag, or commit)
The template startup script will clone/install and run your worker.py.

Guidance for Custom Workers

When implementing your own worker:

Define one HandlerConfig per route you want to expose.
Choose a workload function that correlates with compute cost:
- LLMs: prompt tokens + max output tokens (or max_tokens as a simpler proxy)
- Non-LLMs: constant cost per request (e.g., 100.0) is often sufficient
Set allow_parallel_requests=False for backends that cannot handle concurrency (e.g., many ComfyUI deployments).
Configure exactly one BenchmarkConfig across all handlers to enable capacity estimation.
Use LogActionConfig to reliably detect “model loaded” and “fatal error” log lines.

Community & Support

Vast.ai Discord: https://discord.gg/Pa9M29FFye
Vast.ai Subreddit: https://reddit.com/r/vastai/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vast PyWorker Examples

Repository Purpose

Project Structure

How Serverless launches worker.py

worker.py

Included Examples

Getting Started (Local)

Deploying on Vast Serverless

Guidance for Custom Workers

Community & Support

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Vast PyWorker Examples

Repository Purpose

Project Structure

How Serverless launches worker.py

worker.py

Included Examples

Getting Started (Local)

Deploying on Vast Serverless

Guidance for Custom Workers

Community & Support