All REST endpoints are served under the /v1 prefix.
Interactive OpenAPI docs are available at /docs (Swagger UI) and /redoc.
MATASERVER ships a mataserver console script with the following subcommands.
mataserver [--version] <COMMAND> [options]
| Command | Description |
|---|---|
serve |
Start the inference server (Uvicorn) |
pull <model> --task T |
Download a model from HuggingFace and register it |
list / ls |
List all registered models |
show <model> |
Show detailed info for a registered model |
rm <model> |
Remove a model from the registry (does not delete weights) |
load <model> / warmup |
Preload a model into memory (requires running server) |
stop <model> |
Unload a model from memory (requires running server) |
version |
Print the mataserver version and exit |
Global flags:
| Flag | Description |
|---|---|
-v / --version |
Print version and exit |
Running mataserver (no subcommand) prints help and exits.
Start the Uvicorn inference server. Settings are read from environment variables, a .env file, or a YAML config (see configuration).
mataserver serveDownload a model and register it with the server so it appears in GET /v1/models and can be used for inference. Supports two backend types:
- HuggingFace models — downloaded via
huggingface_hub.snapshot_download()and stored in the standard HF cache (~/.cache/huggingface). These are identified by aorg/repo-nameslash-separated ID. - Pip-based OCR backends — installed via
pipinto the current Python environment. These are identified by a short backend name (e.g.easyocr,paddleocr,tesseract).
mataserver pull <MODEL_ID> --task <TASK>| Argument | Description |
|---|---|
MODEL_ID |
HuggingFace repo ID (org/name) or pip backend name (easyocr, paddleocr, tesseract) |
--task |
Inference task. One of: classify, depth, detect, ocr, pose, segment, track, vlm |
After a successful pull the model ID, task, and source type are written to model_registry.json in MATA_SERVER_DATA_DIR.
Examples:
# Object detection (HuggingFace)
mataserver pull facebook/detr-resnet-50 --task=detect
# Image classification (HuggingFace)
mataserver pull google/vit-base-patch16-224 --task=classify
# Depth estimation (HuggingFace)
mataserver pull LiheYoung/depth-anything-base-hf --task=depth
# OCR — HuggingFace models
mataserver pull stepfun-ai/GOT-OCR-2.0-hf --task ocr
mataserver pull microsoft/trocr-base-printed --task ocr
# OCR — pip-based backends
mataserver pull easyocr --task ocr
mataserver pull paddleocr --task ocr
mataserver pull tesseract --task ocr # also requires the tesseract system binaryPip OCR backends:
easyocr,paddleocr, andtesseractare installed as Python packages into the active virtual environment rather than downloaded from HuggingFace.tesseractadditionally requires thetesseract-ocrsystem binary; if it is not found onPATHa warning is printed but the pull still succeeds. See OCR Backends for details.
Exit codes:
| Code | Meaning |
|---|---|
0 |
Pull completed successfully |
1 |
Pull failed (network error, unrecognised model, etc.) |
2 |
Argument error (missing or invalid --task) |
Tip: You can also pull models via the REST API without stopping the running server — see
POST /v1/models/pull.
List all models registered with the server. Reads from model_registry.json in MATA_SERVER_DATA_DIR.
Alias: mataserver ls
mataserver listOutput — tabular display with columns:
| Column | Description |
|---|---|
MODEL |
HuggingFace repo ID or pip backend name |
TASK |
Inference task (detect, segment, etc.) |
SOURCE |
hf for HuggingFace models, pip for pip-based backends |
SIZE (MB) |
On-disk size in MB from the HuggingFace cache, or — if unknown |
If no models are registered, prints "No models registered.".
Example:
mataserver list
MODEL TASK SOURCE SIZE (MB)
--------------------------------------------------------------
facebook/detr-resnet-50 detect hf 167.3
google/vit-base-patch16-224 classify hf 327.5
easyocr ocr pip —
tesseract ocr pip —Exit codes:
| Code | Meaning |
|---|---|
0 |
Success |
Show detailed information for a registered model.
mataserver show <MODEL_ID>| Argument | Description |
|---|---|
MODEL_ID |
HuggingFace repo ID (e.g. facebook/detr-resnet-50) |
Output fields:
| Field | Description |
|---|---|
model |
HuggingFace repo ID or pip backend name |
task |
Registered inference task |
source |
hf (HuggingFace) or pip (pip-based backend) |
size |
On-disk size in MB from HF cache, or — (pip models have no cache) |
last_accessed |
Timestamp of last HF cache access, or — (pip models) |
pip_packages |
(pip only) Comma-separated list of installed pip packages |
installed |
(pip only) yes / no — whether the package is importable |
system_binary |
(pip only, if applicable) Binary name and whether it was found |
Example — HuggingFace model:
mataserver show facebook/detr-resnet-50
model: facebook/detr-resnet-50
task: detect
source: hf
size: 167.30 MB
last_accessed: 2026-03-05 14:22:01Example — pip backend:
mataserver show easyocr
model: easyocr
task: ocr
source: pip
size: —
last_accessed: —
pip_packages: easyocr
installed: yesExample — Tesseract (with system binary check):
mataserver show tesseract
model: tesseract
task: ocr
source: pip
size: —
last_accessed: —
pip_packages: pytesseract
installed: yes
system_binary: tesseract (yes)Exit codes:
| Code | Meaning |
|---|---|
0 |
Success |
1 |
Model not found in registry |
Remove a model from the registry. This only removes the entry from model_registry.json — model weights in the HuggingFace cache are not deleted.
mataserver rm <MODEL_ID>| Argument | Description |
|---|---|
MODEL_ID |
HuggingFace repo ID to remove from the registry |
Example — HuggingFace model:
mataserver rm facebook/detr-resnet-50
Removed 'facebook/detr-resnet-50' from the registry.
Note: model weights on disk (HF cache) were not deleted.Example — pip backend:
mataserver rm easyocr
Removed 'easyocr' from the registry.
Note: pip packages were not uninstalled. Remove manually if needed.Exit codes:
| Code | Meaning |
|---|---|
0 |
Removed successfully |
1 |
Model not found in registry |
Note: The
loadandstopsubcommands communicate with a running MATASERVER instance over HTTP. They do not work offline. Use--urlto specify the server address if it differs from the default.
Preload a model into memory without running inference. Useful for eliminating cold-start latency. Sends POST /v1/models/warmup to the running server.
Alias: mataserver warmup
mataserver load <MODEL_ID> [--url URL] [--api-key KEY]| Argument / Option | Description |
|---|---|
MODEL_ID |
HuggingFace repo ID to warm up |
--url URL |
Server base URL (default: http://localhost:<MATA_SERVER_PORT> from settings) |
--api-key KEY |
API key for auth (default: MATA_SERVER_API_KEY environment variable) |
URL resolution: When --url is not provided, the CLI reads MATA_SERVER_HOST and MATA_SERVER_PORT from settings. If host is 0.0.0.0, localhost is used instead.
Example:
# Default server address
mataserver load facebook/detr-resnet-50
# Explicit server address and API key
mataserver load facebook/detr-resnet-50 --url http://192.168.1.10:8110 --api-key my-secret
# Using the alias
mataserver warmup facebook/detr-resnet-50Exit codes:
| Code | Meaning |
|---|---|
0 |
Model loaded successfully |
1 |
Connection error or HTTP error from the server |
Unload a model from memory. Sends POST /v1/models/unload to the running server.
mataserver stop <MODEL_ID> [--url URL] [--api-key KEY]| Argument / Option | Description |
|---|---|
MODEL_ID |
HuggingFace repo ID to unload |
--url URL |
Server base URL (default: http://localhost:<MATA_SERVER_PORT> from settings) |
--api-key KEY |
API key for auth (default: MATA_SERVER_API_KEY environment variable) |
Example:
mataserver stop facebook/detr-resnet-50
Model 'facebook/detr-resnet-50' unloaded from memory.If the model was not loaded, the output is:
Model 'facebook/detr-resnet-50' was not loaded (nothing to stop).Exit codes:
| Code | Meaning |
|---|---|
0 |
Model unloaded (or was not loaded) |
1 |
Connection error or HTTP error from the server |
Print the mataserver version and exit. Equivalent to mataserver -v or mataserver --version.
mataserver version
mataserver 0.6.0The ocr task supports two categories of model backend:
These are pulled like any other model — weights are downloaded into the HuggingFace cache and source is recorded as "hf":
| Model ID | Notes |
|---|---|
stepfun-ai/GOT-OCR-2.0-hf |
GOT-OCR2 — general-purpose OCR |
microsoft/trocr-base-printed |
TrOCR — optimised for printed text |
microsoft/trocr-base-handwritten |
TrOCR — optimised for handwritten text |
These are installed as Python packages (and optionally require a system binary). source is recorded as "pip". They do not occupy space in the HuggingFace cache.
| Backend name | Pip packages installed | System binary required | Notes |
|---|---|---|---|
easyocr |
easyocr |
None | Supports 80+ languages |
paddleocr |
paddlepaddle, paddleocr |
None | High accuracy; larger install |
tesseract |
pytesseract |
tesseract-ocr |
Requires system binary (see note below) |
Tesseract system binary:
mataserver pull tesseract --task ocrinstalls thepytesseractPython wrapper but not thetesseract-ocrbinary itself. Install it separately:
- Debian/Ubuntu:
apt-get install -y tesseract-ocr- macOS:
brew install tesseract- Windows: Download from UB-Mannheim/tesseract
If the binary is not found at pull time, a warning is printed but the backend is still registered.
mataserver show easyocr
source: pip
pip_packages: easyocr
installed: yes
mataserver show tesseract
source: pip
pip_packages: pytesseract
installed: yes
system_binary: tesseract (yes)mataserver rm <backend> removes the registration entry only. The pip packages are not uninstalled automatically; remove them manually with pip uninstall <package> if desired.
Most endpoints require a Bearer token in the Authorization header.
Authorization: Bearer <api_key>
Authentication behaviour is controlled by MATA_SERVER_AUTH_MODE:
| Mode | Behaviour |
|---|---|
api_key |
Token required; 401 if missing, 403 if invalid |
none |
All requests accepted without a token (dev/test use) |
401 Unauthorized — missing Authorization header:
{ "detail": "Missing Authorization header" }403 Forbidden — token not in the allowed list:
{ "detail": "Invalid API key" }All error responses follow the standard FastAPI detail structure:
{ "detail": "<human-readable message>" }HTTP status codes used across the API:
| Code | Meaning |
|---|---|
| 400 | Bad request (invalid input, decode error) |
| 401 | Unauthorized — missing credentials |
| 403 | Forbidden — invalid credentials |
| 404 | Resource not found |
| 409 | Conflict (e.g. pull already in progress) |
| 500 | Internal server error |
| 503 | Service unavailable (session limit exceeded) |
| 507 | Insufficient storage (out of VRAM/RAM) |
Returns server status. No authentication required.
Response 200 OK:
{
"status": "ok",
"version": "0.1.0",
"gpu_available": true
}| Field | Type | Description |
|---|---|---|
status |
string | Always "ok" when the server is up |
version |
string | MATASERVER release version |
gpu_available |
boolean | true if a CUDA-capable GPU is present |
Example:
curl http://localhost:8110/v1/healthAll model endpoints require authentication.
List all models that are currently installed (i.e. present in the HuggingFace cache and registered with the server).
Response 200 OK — array of ModelInfo objects:
[
{
"model": "PekingU/rtdetr_v2_r101vd",
"task": "detect",
"source": "hf",
"state": "idle",
"size_mb": 421.0,
"memory_mb": 512.0,
"loaded_at": 1709550000.0,
"last_used": 1709553600.0
},
{
"model": "easyocr",
"task": "ocr",
"source": "pip",
"state": "unloaded",
"size_mb": null,
"memory_mb": null,
"loaded_at": null,
"last_used": null
}
]| Field | Type | Description |
|---|---|---|
model |
string | HuggingFace repo ID or pip backend name |
task |
string | Inference task (detect, segment, classify, …) |
source |
string | "hf" for HuggingFace models, "pip" for pip-based backends |
state |
string | Current lifecycle state (see table below) |
size_mb |
number | null | On-disk size in MB from the HF cache; null for pip backends |
memory_mb |
number | null | Allocated memory in MB (null when unloaded) |
loaded_at |
number | null | Unix timestamp of when the model was loaded |
last_used |
number | null | Unix timestamp of the most recent inference call |
Model state values:
| State | Meaning |
|---|---|
unloaded |
Installed but not in memory |
loading |
Currently loading weights |
ready |
In memory and ready to serve requests |
active |
Currently running an inference |
idle |
In memory but idle (eligible for keep-alive expiry) |
evicted |
Was in memory but was evicted to free resources |
Example:
curl -H "Authorization: Bearer $KEY" http://localhost:8110/v1/modelsRetrieve full details for a specific model.
model_id is the HuggingFace repo ID, e.g. PekingU/rtdetr_v2_r101vd.
Response 200 OK — ModelInfo object (same shape as the list items above).
Response 404 Not Found:
{ "detail": "Model not found: PekingU/rtdetr_v2_r101vd" }Example:
curl -H "Authorization: Bearer $KEY" \
http://localhost:8110/v1/models/PekingU/rtdetr_v2_r101vdDownload or install a model and register it with the server. Supports both HuggingFace models (downloaded into ~/.cache/huggingface) and pip-based OCR backends (installed into the active Python environment). The 202 response is returned once the operation completes.
Request body:
{
"model": "PekingU/rtdetr_v2_r101vd",
"task": "detect"
}| Field | Type | Description |
|---|---|---|
model |
string | HuggingFace repo ID ("org/model-name") or pip backend name ("easyocr", …) |
task |
string | Inference task (detect, segment, ocr, …) |
Response 202 Accepted:
{ "status": "pulled", "model": "PekingU/rtdetr_v2_r101vd" }Error responses:
| Code | Condition |
|---|---|
| 400 | Pull failed (network error, model not found, task mismatch, pip install error) |
| 409 | A pull for the same model is already in progress |
| 500 | Unexpected server error |
Examples:
# HuggingFace model
curl -X POST http://localhost:8110/v1/models/pull \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{"model": "PekingU/rtdetr_v2_r101vd", "task": "detect"}'
# Pip-based OCR backend
curl -X POST http://localhost:8110/v1/models/pull \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{"model": "easyocr", "task": "ocr"}'Pre-load a model into memory without running inference. Useful for eliminating cold-start latency before the first request.
Request body:
{ "model": "datamata/rtdetr-l" }Response 200 OK:
{ "status": "ready", "model": "datamata/rtdetr-l" }Error responses:
| Code | Condition |
|---|---|
| 404 | Model ref not found in registry |
| 507 | Insufficient VRAM/RAM to load the model |
| 500 | Unexpected server error |
Example:
curl -X POST http://localhost:8110/v1/models/warmup \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{"model": "datamata/rtdetr-l"}'All inference endpoints require authentication. The server auto-loads the requested model on the first call if it is not already in memory.
All inference endpoints return the same mata.v1 schema:
{
"schema_version": "mata.v1",
"task": "detect",
"model": "datamata/rtdetr-l",
"timestamp": 1709553600.123,
"detections": [
{
"label": "person",
"confidence": 0.94,
"bbox": [120.5, 45.2, 380.1, 520.8]
}
]
}Only the task-relevant field is populated in each response:
| Task | Populated field | Type |
|---|---|---|
detect |
detections |
Detection[] |
segment |
segments |
Segment[] |
classify |
classifications |
Classification[] |
pose |
keypoints |
Keypoint[] |
ocr |
text, regions |
string, TextRegion[] |
depth |
depth_map |
base64 float32 array |
zero-shot |
classifications |
Classification[] |
Detection:
{ "label": "car", "confidence": 0.87, "bbox": [x1, y1, x2, y2] }Segment (same as Detection plus optional mask):
{ "label": "car", "confidence": 0.87, "bbox": [x1, y1, x2, y2], "mask": "<base64>" }Classification:
{ "label": "tabby cat", "confidence": 0.92 }Keypoint:
{ "label": "person", "confidence": 0.88, "bbox": [x1, y1, x2, y2], "keypoints": [[x, y, conf], ...] }TextRegion:
{ "text": "Hello world", "bbox": [x1, y1, x2, y2], "confidence": 0.99 }Run single-shot inference with a JSON body containing a base64-encoded image.
Request body:
{
"model": "datamata/rtdetr-l",
"image": "<base64-encoded image bytes>",
"params": {
"confidence": 0.4
}
}| Field | Type | Required | Description |
|---|---|---|---|
model |
string | ✅ | Model reference (must be installed) |
image |
string | ✅ | Base64-encoded image (JPEG or PNG recommended) |
params |
object | ❌ | Typed inference parameters (see table below) |
All fields are optional. Only send the parameters relevant to your model's task.
| Field | Type | Tasks | Description | MATA native name |
|---|---|---|---|---|
confidence |
float [0.0–1.0] | detect, segment, classify, pose | Confidence threshold | threshold |
prompts |
string | string[] | zero-shot detect/segment/classify, SAM3 | Text prompts for zero-shot models | text_prompts |
prompt |
string | vlm | Text question about the image | prompt |
system_prompt |
string | vlm | System prompt override | system_prompt |
max_tokens |
int (>0) | vlm, ocr (HF) | Maximum tokens to generate | max_new_tokens |
temperature |
float (≥0) | vlm | Sampling temperature (0 = greedy) | temperature |
top_p |
float [0.0–1.0] | vlm | Nucleus sampling probability | top_p |
top_k |
int (>0) | classify, vlm | Top-K predictions / sampling | top_k |
output_mode |
string | vlm | Output mode: json, detect, classify, describe |
output_mode |
use_softmax |
bool | classify (CLIP) | Apply softmax normalization | use_softmax |
language |
string | ocr (Tesseract, Paddle) | OCR language code override | lang |
ocr_type |
string | ocr (GOT-OCR2) | ocr (plain text) or format (markdown/LaTeX) |
ocr_type |
detail |
int (0 or 1) | ocr (EasyOCR) | 0 = text-only, 1 = full with bboxes | detail |
normalize |
bool | depth | Normalize depth map values | normalize |
target_size |
[int, int] | depth | Output depth map size [height, width] | target_size |
point_prompts |
[[x,y,label],...] | SAM | Point prompts (label: 1=foreground, 0=background) | point_prompts |
box_prompts |
[[x1,y1,x2,y2],...] | SAM | Bounding box prompts | box_prompts |
detection_confidence |
float [0.0–1.0] | pipeline | Detection threshold for GroundingDINO+SAM pipeline | detection_threshold |
segmentation_confidence |
float [0.0–1.0] | pipeline | Segmentation threshold for GroundingDINO+SAM pipeline | segmentation_threshold |
Migration note: The
paramsobject previously accepted raw MATA-native names (threshold,text_prompts,max_new_tokens,lang,detection_threshold,segmentation_threshold). These are now rejected with422 Unprocessable Entity. Use the user-friendly names from the table above. See the full mapping below.
Object Detection:
{
"model": "datamata/rtdetr-l",
"image": "<base64>",
"params": { "confidence": 0.4 }
}Zero-shot Detection:
{
"model": "datamata/grounding-dino",
"image": "<base64>",
"params": { "prompts": "cat . dog . person", "confidence": 0.3 }
}Classification:
{ "model": "datamata/resnet50", "image": "<base64>", "params": { "top_k": 5 } }Zero-shot Classification (CLIP):
{
"model": "datamata/clip-vit",
"image": "<base64>",
"params": { "prompts": ["cat", "dog", "bird"], "top_k": 3 }
}VLM:
{
"model": "datamata/llava-1.5",
"image": "<base64>",
"params": {
"prompt": "What objects are in this image?",
"max_tokens": 256,
"temperature": 0.3
}
}OCR:
{
"model": "datamata/easyocr",
"image": "<base64>",
"params": { "language": "jpn", "detail": 1 }
}Depth:
{
"model": "datamata/depth-anything",
"image": "<base64>",
"params": { "target_size": [480, 640], "normalize": true }
}Segmentation (SAM):
{
"model": "datamata/sam2",
"image": "<base64>",
"params": { "prompts": "cat", "confidence": 0.5 }
}Response 200 OK — mata.v1 InferResponse (see schema above).
Error responses:
| Code | Condition |
|---|---|
| 400 | Missing image field or invalid base64 |
| 422 | Invalid or unknown params field |
| 404 | Model ref not found / not installed |
| 500 | Engine or runtime error |
Example:
IMAGE_B64=$(base64 -w0 photo.jpg)
curl -X POST http://localhost:8110/v1/infer \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d "{\"model\": \"datamata/rtdetr-l\", \"image\": \"$IMAGE_B64\", \"params\": {\"confidence\": 0.4}}"Run single-shot inference with a multipart form upload. Convenient when sending images directly from disk or a webcam capture without pre-encoding.
Request — multipart/form-data:
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | ✅ | Model reference |
file |
file | ✅ | Image file (JPEG, PNG, etc.) |
confidence |
float | ❌ | Confidence threshold [0.0–1.0] |
prompts |
string | ❌ | Text prompts for zero-shot (dot-separated) |
prompt |
string | ❌ | Text question for VLM |
language |
string | ❌ | OCR language code |
top_k |
int | ❌ | Number of top predictions |
max_tokens |
int | ❌ | Max tokens to generate |
temperature |
float | ❌ | Sampling temperature (VLM) |
output_mode |
string | ❌ | VLM output mode |
ocr_type |
string | ❌ | OCR output type |
detail |
int | ❌ | OCR detail level (0 or 1) |
Complex params (
point_prompts,box_prompts,target_size,normalize,use_softmax,system_prompt,top_p,detection_confidence,segmentation_confidence) are JSON-endpoint-only and cannot be passed via multipart upload.
Response 200 OK — mata.v1 InferResponse (see schema above).
Error responses:
| Code | Condition |
|---|---|
| 400 | Uploaded file is empty |
| 422 | Invalid form field value |
| 404 | Model ref not found |
| 500 | Engine or runtime error |
Example:
curl -X POST http://localhost:8110/v1/infer/upload \
-H "Authorization: Bearer $KEY" \
-F "model=datamata/rtdetr-l" \
-F "confidence=0.4" \
-F "file=@photo.jpg"The params object uses user-friendly names that differ from the MATA adapter's internal kwarg names. Sending the old MATA-native names directly will return 422 Unprocessable Entity.
| User-friendly name (API) | MATA native name (internal) |
|---|---|
confidence |
threshold |
prompts |
text_prompts |
max_tokens |
max_new_tokens |
language |
lang |
detection_confidence |
detection_threshold |
segmentation_confidence |
segmentation_threshold |
All other fields (prompt, system_prompt, temperature, top_p, top_k, output_mode, use_softmax, ocr_type, detail, normalize, target_size, point_prompts, box_prompts) pass through unchanged.
Sessions are used together with the WebSocket streaming endpoint (/v1/stream/{session_id}).
All session endpoints require authentication.
Create a new streaming session. The server ensures the requested model is loaded before returning.
Request body:
{
"model": "datamata/rtdetr-l",
"task": "detect",
"params": { "confidence": 0.4 }
}| Field | Type | Required | Description |
|---|---|---|---|
model |
string | ✅ | Model reference |
task |
string | null | ❌ | Inference task override |
params |
object | ❌ | Parameters forwarded to adapter.predict() on each frame |
Response 201 Created:
{
"session_id": "sess_3f8a1b2c9d4e",
"ws_url": "ws://localhost:8110/v1/stream/sess_3f8a1b2c9d4e"
}The ws_url uses wss:// when the server is reached over HTTPS.
Error responses:
| Code | Condition |
|---|---|
| 404 | Model ref not found |
| 503 | Maximum concurrent sessions reached (limit: 10) |
| 507 | Insufficient memory to load the model |
Example:
curl -X POST http://localhost:8110/v1/sessions \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{"model": "datamata/rtdetr-l", "task": "detect", "params": {"confidence": 0.5}}'Close and clean up a streaming session. If the WebSocket is still connected it will be disconnected.
Response 204 No Content — on success (empty body).
Response 404 Not Found:
{ "detail": "Session not found: sess_3f8a1b2c9d4e" }Example:
curl -X DELETE http://localhost:8110/v1/sessions/sess_3f8a1b2c9d4e \
-H "Authorization: Bearer $KEY"See streaming.md for the full WebSocket protocol specification.
WebSocket endpoint for real-time frame-by-frame inference. Clients send binary frames and receive JSON inference results. Requires a session created via POST /v1/sessions.
Query parameters:
| Parameter | Required | Description |
|---|---|---|
token |
✅* | API key (required when auth_mode = api_key) |
WebSocket close codes:
| Code | Meaning |
|---|---|
| 4001 | Unauthorized — bad or missing ?token= |
| 4004 | Invalid or expired session ID |
| 1011 | Internal server error during inference |
Example connection URL:
ws://localhost:8110/v1/stream/sess_3f8a1b2c9d4e?token=my-api-key