API Reference

All REST endpoints are served under the /v1 prefix.
Interactive OpenAPI docs are available at /docs (Swagger UI) and /redoc.

CLI

MATASERVER ships a mataserver console script with the following subcommands.

mataserver [--version] <COMMAND> [options]

Command	Description
`serve`	Start the inference server (Uvicorn)
`pull <model> --task T`	Download a model from HuggingFace and register it
`list` / `ls`	List all registered models
`show <model>`	Show detailed info for a registered model
`rm <model>`	Remove a model from the registry (does not delete weights)
`load <model>` / `warmup`	Preload a model into memory (requires running server)
`stop <model>`	Unload a model from memory (requires running server)
`version`	Print the mataserver version and exit

Global flags:

Flag	Description
`-v` / `--version`	Print version and exit

Running mataserver (no subcommand) prints help and exits.

`mataserver serve`

Start the Uvicorn inference server. Settings are read from environment variables, a .env file, or a YAML config (see configuration).

mataserver serve

`mataserver pull`

Download a model and register it with the server so it appears in GET /v1/models and can be used for inference. Supports two backend types:

HuggingFace models — downloaded via huggingface_hub.snapshot_download() and stored in the standard HF cache (~/.cache/huggingface). These are identified by a org/repo-name slash-separated ID.
Pip-based OCR backends — installed via pip into the current Python environment. These are identified by a short backend name (e.g. easyocr, paddleocr, tesseract).

mataserver pull <MODEL_ID> --task <TASK>

Argument	Description
`MODEL_ID`	HuggingFace repo ID (`org/name`) or pip backend name (`easyocr`, `paddleocr`, `tesseract`)
`--task`	Inference task. One of: `classify`, `depth`, `detect`, `ocr`, `pose`, `segment`, `track`, `vlm`

After a successful pull the model ID, task, and source type are written to model_registry.json in MATA_SERVER_DATA_DIR.

Examples:

# Object detection (HuggingFace)
mataserver pull facebook/detr-resnet-50 --task=detect

# Image classification (HuggingFace)
mataserver pull google/vit-base-patch16-224 --task=classify

# Depth estimation (HuggingFace)
mataserver pull LiheYoung/depth-anything-base-hf --task=depth

# OCR — HuggingFace models
mataserver pull stepfun-ai/GOT-OCR-2.0-hf --task ocr
mataserver pull microsoft/trocr-base-printed --task ocr

# OCR — pip-based backends
mataserver pull easyocr --task ocr
mataserver pull paddleocr --task ocr
mataserver pull tesseract --task ocr  # also requires the tesseract system binary

Pip OCR backends: easyocr, paddleocr, and tesseract are installed as Python packages into the active virtual environment rather than downloaded from HuggingFace. tesseract additionally requires the tesseract-ocr system binary; if it is not found on PATH a warning is printed but the pull still succeeds. See OCR Backends for details.

Exit codes:

Code	Meaning
`0`	Pull completed successfully
`1`	Pull failed (network error, unrecognised model, etc.)
`2`	Argument error (missing or invalid `--task`)

Tip: You can also pull models via the REST API without stopping the running server — see POST /v1/models/pull.

`mataserver list`

List all models registered with the server. Reads from model_registry.json in MATA_SERVER_DATA_DIR.

Alias: mataserver ls

mataserver list

Output — tabular display with columns:

Column	Description
`MODEL`	HuggingFace repo ID or pip backend name
`TASK`	Inference task (`detect`, `segment`, etc.)
`SOURCE`	`hf` for HuggingFace models, `pip` for pip-based backends
`SIZE (MB)`	On-disk size in MB from the HuggingFace cache, or `—` if unknown

If no models are registered, prints "No models registered.".

Example:

mataserver list

MODEL                        TASK     SOURCE  SIZE (MB)
--------------------------------------------------------------
facebook/detr-resnet-50      detect   hf          167.3
google/vit-base-patch16-224  classify hf          327.5
easyocr                      ocr      pip             —
tesseract                    ocr      pip             —

Exit codes:

Code	Meaning
`0`	Success

`mataserver show`

Show detailed information for a registered model.

mataserver show <MODEL_ID>

Argument	Description
`MODEL_ID`	HuggingFace repo ID (e.g. `facebook/detr-resnet-50`)

Output fields:

Field	Description
`model`	HuggingFace repo ID or pip backend name
`task`	Registered inference task
`source`	`hf` (HuggingFace) or `pip` (pip-based backend)
`size`	On-disk size in MB from HF cache, or `—` (pip models have no cache)
`last_accessed`	Timestamp of last HF cache access, or `—` (pip models)
`pip_packages`	(pip only) Comma-separated list of installed pip packages
`installed`	(pip only) `yes` / `no` — whether the package is importable
`system_binary`	(pip only, if applicable) Binary name and whether it was found

Example — HuggingFace model:

mataserver show facebook/detr-resnet-50

  model:         facebook/detr-resnet-50
  task:          detect
  source:        hf
  size:          167.30 MB
  last_accessed: 2026-03-05 14:22:01

Example — pip backend:

mataserver show easyocr

  model:         easyocr
  task:          ocr
  source:        pip
  size:          —
  last_accessed: —
  pip_packages:  easyocr
  installed:     yes

Example — Tesseract (with system binary check):

mataserver show tesseract

  model:         tesseract
  task:          ocr
  source:        pip
  size:          —
  last_accessed: —
  pip_packages:  pytesseract
  installed:     yes
  system_binary: tesseract (yes)

Exit codes:

Code	Meaning
`0`	Success
`1`	Model not found in registry

`mataserver rm`

Remove a model from the registry. This only removes the entry from model_registry.json — model weights in the HuggingFace cache are not deleted.

mataserver rm <MODEL_ID>

Argument	Description
`MODEL_ID`	HuggingFace repo ID to remove from the registry

Example — HuggingFace model:

mataserver rm facebook/detr-resnet-50
Removed 'facebook/detr-resnet-50' from the registry.
Note: model weights on disk (HF cache) were not deleted.

Example — pip backend:

mataserver rm easyocr
Removed 'easyocr' from the registry.
Note: pip packages were not uninstalled. Remove manually if needed.

Exit codes:

Code	Meaning
`0`	Removed successfully
`1`	Model not found in registry

Note: The load and stop subcommands communicate with a running MATASERVER instance over HTTP. They do not work offline. Use --url to specify the server address if it differs from the default.

`mataserver load`

Preload a model into memory without running inference. Useful for eliminating cold-start latency. Sends POST /v1/models/warmup to the running server.

Alias: mataserver warmup

mataserver load <MODEL_ID> [--url URL] [--api-key KEY]

Argument / Option	Description
`MODEL_ID`	HuggingFace repo ID to warm up
`--url URL`	Server base URL (default: `http://localhost:<MATA_SERVER_PORT>` from settings)
`--api-key KEY`	API key for auth (default: `MATA_SERVER_API_KEY` environment variable)

URL resolution: When --url is not provided, the CLI reads MATA_SERVER_HOST and MATA_SERVER_PORT from settings. If host is 0.0.0.0, localhost is used instead.

Example:

# Default server address
mataserver load facebook/detr-resnet-50

# Explicit server address and API key
mataserver load facebook/detr-resnet-50 --url http://192.168.1.10:8110 --api-key my-secret

# Using the alias
mataserver warmup facebook/detr-resnet-50

Exit codes:

Code	Meaning
`0`	Model loaded successfully
`1`	Connection error or HTTP error from the server

`mataserver stop`

Unload a model from memory. Sends POST /v1/models/unload to the running server.

mataserver stop <MODEL_ID> [--url URL] [--api-key KEY]

Argument / Option	Description
`MODEL_ID`	HuggingFace repo ID to unload
`--url URL`	Server base URL (default: `http://localhost:<MATA_SERVER_PORT>` from settings)
`--api-key KEY`	API key for auth (default: `MATA_SERVER_API_KEY` environment variable)

Example:

mataserver stop facebook/detr-resnet-50
Model 'facebook/detr-resnet-50' unloaded from memory.

If the model was not loaded, the output is:

Model 'facebook/detr-resnet-50' was not loaded (nothing to stop).

Exit codes:

Code	Meaning
`0`	Model unloaded (or was not loaded)
`1`	Connection error or HTTP error from the server

`mataserver version`

Print the mataserver version and exit. Equivalent to mataserver -v or mataserver --version.

mataserver version
mataserver 0.6.0

OCR Backends

The ocr task supports two categories of model backend:

HuggingFace OCR models

These are pulled like any other model — weights are downloaded into the HuggingFace cache and source is recorded as "hf":

Model ID	Notes
`stepfun-ai/GOT-OCR-2.0-hf`	GOT-OCR2 — general-purpose OCR
`microsoft/trocr-base-printed`	TrOCR — optimised for printed text
`microsoft/trocr-base-handwritten`	TrOCR — optimised for handwritten text

Pip-based OCR backends

These are installed as Python packages (and optionally require a system binary). source is recorded as "pip". They do not occupy space in the HuggingFace cache.

Backend name	Pip packages installed	System binary required	Notes
`easyocr`	`easyocr`	None	Supports 80+ languages
`paddleocr`	`paddlepaddle`, `paddleocr`	None	High accuracy; larger install
`tesseract`	`pytesseract`	`tesseract-ocr`	Requires system binary (see note below)

Tesseract system binary: mataserver pull tesseract --task ocr installs the pytesseract Python wrapper but not the tesseract-ocr binary itself. Install it separately:

Debian/Ubuntu: apt-get install -y tesseract-ocr

macOS: brew install tesseract

Windows: Download from UB-Mannheim/tesseract

If the binary is not found at pull time, a warning is printed but the backend is still registered.

Checking backend status

mataserver show easyocr
  source:        pip
  pip_packages:  easyocr
  installed:     yes

mataserver show tesseract
  source:        pip
  pip_packages:  pytesseract
  installed:     yes
  system_binary: tesseract (yes)

Removing a pip backend

mataserver rm <backend> removes the registration entry only. The pip packages are not uninstalled automatically; remove them manually with pip uninstall <package> if desired.

Authentication

Most endpoints require a Bearer token in the Authorization header.

Authorization: Bearer <api_key>

Authentication behaviour is controlled by MATA_SERVER_AUTH_MODE:

Mode	Behaviour
`api_key`	Token required; `401` if missing, `403` if invalid
`none`	All requests accepted without a token (dev/test use)

Error responses for auth failures

401 Unauthorized — missing Authorization header:

{ "detail": "Missing Authorization header" }

403 Forbidden — token not in the allowed list:

{ "detail": "Invalid API key" }

Common error format

All error responses follow the standard FastAPI detail structure:

{ "detail": "<human-readable message>" }

HTTP status codes used across the API:

Code	Meaning
400	Bad request (invalid input, decode error)
401	Unauthorized — missing credentials
403	Forbidden — invalid credentials
404	Resource not found
409	Conflict (e.g. pull already in progress)
500	Internal server error
503	Service unavailable (session limit exceeded)
507	Insufficient storage (out of VRAM/RAM)

Health

`GET /v1/health`

Returns server status. No authentication required.

Response 200 OK:

{
  "status": "ok",
  "version": "0.1.0",
  "gpu_available": true
}

Field	Type	Description
`status`	string	Always `"ok"` when the server is up
`version`	string	MATASERVER release version
`gpu_available`	boolean	`true` if a CUDA-capable GPU is present

Example:

curl http://localhost:8110/v1/health

Models

All model endpoints require authentication.

`GET /v1/models`

List all models that are currently installed (i.e. present in the HuggingFace cache and registered with the server).

Response 200 OK — array of ModelInfo objects:

[
  {
    "model": "PekingU/rtdetr_v2_r101vd",
    "task": "detect",
    "source": "hf",
    "state": "idle",
    "size_mb": 421.0,
    "memory_mb": 512.0,
    "loaded_at": 1709550000.0,
    "last_used": 1709553600.0
  },
  {
    "model": "easyocr",
    "task": "ocr",
    "source": "pip",
    "state": "unloaded",
    "size_mb": null,
    "memory_mb": null,
    "loaded_at": null,
    "last_used": null
  }
]

Field	Type	Description
`model`	string	HuggingFace repo ID or pip backend name
`task`	string	Inference task (`detect`, `segment`, `classify`, …)
`source`	string	`"hf"` for HuggingFace models, `"pip"` for pip-based backends
`state`	string	Current lifecycle state (see table below)
`size_mb`	number \| null	On-disk size in MB from the HF cache; `null` for pip backends
`memory_mb`	number \| null	Allocated memory in MB (null when unloaded)
`loaded_at`	number \| null	Unix timestamp of when the model was loaded
`last_used`	number \| null	Unix timestamp of the most recent inference call

Model state values:

State	Meaning
`unloaded`	Installed but not in memory
`loading`	Currently loading weights
`ready`	In memory and ready to serve requests
`active`	Currently running an inference
`idle`	In memory but idle (eligible for keep-alive expiry)
`evicted`	Was in memory but was evicted to free resources

Example:

curl -H "Authorization: Bearer $KEY" http://localhost:8110/v1/models

`GET /v1/models/{model_id}`

Retrieve full details for a specific model.

model_id is the HuggingFace repo ID, e.g. PekingU/rtdetr_v2_r101vd.

Response 200 OK — ModelInfo object (same shape as the list items above).

Response 404 Not Found:

{ "detail": "Model not found: PekingU/rtdetr_v2_r101vd" }

Example:

curl -H "Authorization: Bearer $KEY" \
     http://localhost:8110/v1/models/PekingU/rtdetr_v2_r101vd

`POST /v1/models/pull`

Download or install a model and register it with the server. Supports both HuggingFace models (downloaded into ~/.cache/huggingface) and pip-based OCR backends (installed into the active Python environment). The 202 response is returned once the operation completes.

Request body:

{
  "model": "PekingU/rtdetr_v2_r101vd",
  "task": "detect"
}

Field	Type	Description
`model`	string	HuggingFace repo ID (`"org/model-name"`) or pip backend name (`"easyocr"`, …)
`task`	string	Inference task (`detect`, `segment`, `ocr`, …)

Response 202 Accepted:

{ "status": "pulled", "model": "PekingU/rtdetr_v2_r101vd" }

Error responses:

Code	Condition
400	Pull failed (network error, model not found, task mismatch, pip install error)
409	A pull for the same model is already in progress
500	Unexpected server error

Examples:

# HuggingFace model
curl -X POST http://localhost:8110/v1/models/pull \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "PekingU/rtdetr_v2_r101vd", "task": "detect"}'

# Pip-based OCR backend
curl -X POST http://localhost:8110/v1/models/pull \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "easyocr", "task": "ocr"}'

`POST /v1/models/warmup`

Pre-load a model into memory without running inference. Useful for eliminating cold-start latency before the first request.

Request body:

{ "model": "datamata/rtdetr-l" }

Response 200 OK:

{ "status": "ready", "model": "datamata/rtdetr-l" }

Error responses:

Code	Condition
404	Model ref not found in registry
507	Insufficient VRAM/RAM to load the model
500	Unexpected server error

Example:

curl -X POST http://localhost:8110/v1/models/warmup \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "datamata/rtdetr-l"}'

Inference

All inference endpoints require authentication. The server auto-loads the requested model on the first call if it is not already in memory.

Response schema (`mata.v1`)

All inference endpoints return the same mata.v1 schema:

{
  "schema_version": "mata.v1",
  "task": "detect",
  "model": "datamata/rtdetr-l",
  "timestamp": 1709553600.123,
  "detections": [
    {
      "label": "person",
      "confidence": 0.94,
      "bbox": [120.5, 45.2, 380.1, 520.8]
    }
  ]
}

Only the task-relevant field is populated in each response:

Task	Populated field	Type
`detect`	`detections`	`Detection[]`
`segment`	`segments`	`Segment[]`
`classify`	`classifications`	`Classification[]`
`pose`	`keypoints`	`Keypoint[]`
`ocr`	`text`, `regions`	string, `TextRegion[]`
`depth`	`depth_map`	base64 float32 array
`zero-shot`	`classifications`	`Classification[]`

Detection:

{ "label": "car", "confidence": 0.87, "bbox": [x1, y1, x2, y2] }

Segment (same as Detection plus optional mask):

{ "label": "car", "confidence": 0.87, "bbox": [x1, y1, x2, y2], "mask": "<base64>" }

Classification:

{ "label": "tabby cat", "confidence": 0.92 }

Keypoint:

{ "label": "person", "confidence": 0.88, "bbox": [x1, y1, x2, y2], "keypoints": [[x, y, conf], ...] }

TextRegion:

{ "text": "Hello world", "bbox": [x1, y1, x2, y2], "confidence": 0.99 }

`POST /v1/infer`

Run single-shot inference with a JSON body containing a base64-encoded image.

Request body:

{
  "model": "datamata/rtdetr-l",
  "image": "<base64-encoded image bytes>",
  "params": {
    "confidence": 0.4
  }
}

Field	Type	Required	Description
`model`	string	✅	Model reference (must be installed)
`image`	string	✅	Base64-encoded image (JPEG or PNG recommended)
`params`	object	❌	Typed inference parameters (see table below)

Inference Parameters (`params` object)

All fields are optional. Only send the parameters relevant to your model's task.

Field	Type	Tasks	Description	MATA native name
`confidence`	float [0.0–1.0]	detect, segment, classify, pose	Confidence threshold	`threshold`
`prompts`	string \| string[]	zero-shot detect/segment/classify, SAM3	Text prompts for zero-shot models	`text_prompts`
`prompt`	string	vlm	Text question about the image	`prompt`
`system_prompt`	string	vlm	System prompt override	`system_prompt`
`max_tokens`	int (>0)	vlm, ocr (HF)	Maximum tokens to generate	`max_new_tokens`
`temperature`	float (≥0)	vlm	Sampling temperature (0 = greedy)	`temperature`
`top_p`	float [0.0–1.0]	vlm	Nucleus sampling probability	`top_p`
`top_k`	int (>0)	classify, vlm	Top-K predictions / sampling	`top_k`
`output_mode`	string	vlm	Output mode: `json`, `detect`, `classify`, `describe`	`output_mode`
`use_softmax`	bool	classify (CLIP)	Apply softmax normalization	`use_softmax`
`language`	string	ocr (Tesseract, Paddle)	OCR language code override	`lang`
`ocr_type`	string	ocr (GOT-OCR2)	`ocr` (plain text) or `format` (markdown/LaTeX)	`ocr_type`
`detail`	int (0 or 1)	ocr (EasyOCR)	0 = text-only, 1 = full with bboxes	`detail`
`normalize`	bool	depth	Normalize depth map values	`normalize`
`target_size`	[int, int]	depth	Output depth map size [height, width]	`target_size`
`point_prompts`	[[x,y,label],...]	SAM	Point prompts (label: 1=foreground, 0=background)	`point_prompts`
`box_prompts`	[[x1,y1,x2,y2],...]	SAM	Bounding box prompts	`box_prompts`
`detection_confidence`	float [0.0–1.0]	pipeline	Detection threshold for GroundingDINO+SAM pipeline	`detection_threshold`
`segmentation_confidence`	float [0.0–1.0]	pipeline	Segmentation threshold for GroundingDINO+SAM pipeline	`segmentation_threshold`

Migration note: The params object previously accepted raw MATA-native names (threshold, text_prompts, max_new_tokens, lang, detection_threshold, segmentation_threshold). These are now rejected with 422 Unprocessable Entity. Use the user-friendly names from the table above. See the full mapping below.

Per-task request examples

Object Detection:

{
  "model": "datamata/rtdetr-l",
  "image": "<base64>",
  "params": { "confidence": 0.4 }
}

Zero-shot Detection:

{
  "model": "datamata/grounding-dino",
  "image": "<base64>",
  "params": { "prompts": "cat . dog . person", "confidence": 0.3 }
}

Classification:

{ "model": "datamata/resnet50", "image": "<base64>", "params": { "top_k": 5 } }

Zero-shot Classification (CLIP):

{
  "model": "datamata/clip-vit",
  "image": "<base64>",
  "params": { "prompts": ["cat", "dog", "bird"], "top_k": 3 }
}

VLM:

{
  "model": "datamata/llava-1.5",
  "image": "<base64>",
  "params": {
    "prompt": "What objects are in this image?",
    "max_tokens": 256,
    "temperature": 0.3
  }
}

OCR:

{
  "model": "datamata/easyocr",
  "image": "<base64>",
  "params": { "language": "jpn", "detail": 1 }
}

Depth:

{
  "model": "datamata/depth-anything",
  "image": "<base64>",
  "params": { "target_size": [480, 640], "normalize": true }
}

Segmentation (SAM):

{
  "model": "datamata/sam2",
  "image": "<base64>",
  "params": { "prompts": "cat", "confidence": 0.5 }
}

Response 200 OK — mata.v1 InferResponse (see schema above).

Error responses:

Code	Condition
400	Missing `image` field or invalid base64
422	Invalid or unknown `params` field
404	Model ref not found / not installed
500	Engine or runtime error

Example:

IMAGE_B64=$(base64 -w0 photo.jpg)
curl -X POST http://localhost:8110/v1/infer \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d "{\"model\": \"datamata/rtdetr-l\", \"image\": \"$IMAGE_B64\", \"params\": {\"confidence\": 0.4}}"

`POST /v1/infer/upload`

Run single-shot inference with a multipart form upload. Convenient when sending images directly from disk or a webcam capture without pre-encoding.

Request — multipart/form-data:

Field	Type	Required	Description
`model`	string	✅	Model reference
`file`	file	✅	Image file (JPEG, PNG, etc.)
`confidence`	float	❌	Confidence threshold [0.0–1.0]
`prompts`	string	❌	Text prompts for zero-shot (dot-separated)
`prompt`	string	❌	Text question for VLM
`language`	string	❌	OCR language code
`top_k`	int	❌	Number of top predictions
`max_tokens`	int	❌	Max tokens to generate
`temperature`	float	❌	Sampling temperature (VLM)
`output_mode`	string	❌	VLM output mode
`ocr_type`	string	❌	OCR output type
`detail`	int	❌	OCR detail level (0 or 1)

Complex params (point_prompts, box_prompts, target_size, normalize, use_softmax, system_prompt, top_p, detection_confidence, segmentation_confidence) are JSON-endpoint-only and cannot be passed via multipart upload.

Response 200 OK — mata.v1 InferResponse (see schema above).

Error responses:

Code	Condition
400	Uploaded file is empty
422	Invalid form field value
404	Model ref not found
500	Engine or runtime error

Example:

curl -X POST http://localhost:8110/v1/infer/upload \
  -H "Authorization: Bearer $KEY" \
  -F "model=datamata/rtdetr-l" \
  -F "confidence=0.4" \
  -F "file=@photo.jpg"

Param name migration

The params object uses user-friendly names that differ from the MATA adapter's internal kwarg names. Sending the old MATA-native names directly will return 422 Unprocessable Entity.

User-friendly name (API)	MATA native name (internal)
`confidence`	`threshold`
`prompts`	`text_prompts`
`max_tokens`	`max_new_tokens`
`language`	`lang`
`detection_confidence`	`detection_threshold`
`segmentation_confidence`	`segmentation_threshold`

All other fields (prompt, system_prompt, temperature, top_p, top_k, output_mode, use_softmax, ocr_type, detail, normalize, target_size, point_prompts, box_prompts) pass through unchanged.

Sessions

Sessions are used together with the WebSocket streaming endpoint (/v1/stream/{session_id}).
All session endpoints require authentication.

`POST /v1/sessions`

Create a new streaming session. The server ensures the requested model is loaded before returning.

Request body:

{
  "model": "datamata/rtdetr-l",
  "task": "detect",
  "params": { "confidence": 0.4 }
}

Field	Type	Required	Description
`model`	string	✅	Model reference
`task`	string \| null	❌	Inference task override
`params`	object	❌	Parameters forwarded to `adapter.predict()` on each frame

Response 201 Created:

{
  "session_id": "sess_3f8a1b2c9d4e",
  "ws_url": "ws://localhost:8110/v1/stream/sess_3f8a1b2c9d4e"
}

The ws_url uses wss:// when the server is reached over HTTPS.

Error responses:

Code	Condition
404	Model ref not found
503	Maximum concurrent sessions reached (limit: 10)
507	Insufficient memory to load the model

Example:

curl -X POST http://localhost:8110/v1/sessions \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "datamata/rtdetr-l", "task": "detect", "params": {"confidence": 0.5}}'

`DELETE /v1/sessions/{session_id}`

Close and clean up a streaming session. If the WebSocket is still connected it will be disconnected.

Response 204 No Content — on success (empty body).

Response 404 Not Found:

{ "detail": "Session not found: sess_3f8a1b2c9d4e" }

Example:

curl -X DELETE http://localhost:8110/v1/sessions/sess_3f8a1b2c9d4e \
  -H "Authorization: Bearer $KEY"

Streaming

See streaming.md for the full WebSocket protocol specification.

`WS /v1/stream/{session_id}`

WebSocket endpoint for real-time frame-by-frame inference. Clients send binary frames and receive JSON inference results. Requires a session created via POST /v1/sessions.

Query parameters:

Parameter	Required	Description
`token`	✅*	API key (required when `auth_mode = api_key`)

WebSocket close codes:

Code	Meaning
4001	Unauthorized — bad or missing `?token=`
4004	Invalid or expired session ID
1011	Internal server error during inference

Example connection URL:

ws://localhost:8110/v1/stream/sess_3f8a1b2c9d4e?token=my-api-key

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

CLI

`mataserver serve`

`mataserver pull`

`mataserver list`

`mataserver show`

`mataserver rm`

`mataserver load`

`mataserver stop`

`mataserver version`

OCR Backends

HuggingFace OCR models

Pip-based OCR backends

Checking backend status

Removing a pip backend

Authentication

Error responses for auth failures

Common error format

Health

`GET /v1/health`

Models

`GET /v1/models`

`GET /v1/models/{model_id}`

`POST /v1/models/pull`

`POST /v1/models/warmup`

Inference

Response schema (`mata.v1`)

`POST /v1/infer`

Inference Parameters (`params` object)

Per-task request examples

`POST /v1/infer/upload`

Param name migration

Sessions

`POST /v1/sessions`

`DELETE /v1/sessions/{session_id}`

Streaming

`WS /v1/stream/{session_id}`

FilesExpand file tree

api.md

Latest commit

History

api.md

File metadata and controls

API Reference

CLI

mataserver serve

mataserver pull

mataserver list

mataserver show

mataserver rm

mataserver load

mataserver stop

mataserver version

OCR Backends

HuggingFace OCR models

Pip-based OCR backends

Checking backend status

Removing a pip backend

Authentication

Error responses for auth failures

Common error format

Health

GET /v1/health

Models

GET /v1/models

GET /v1/models/{model_id}

POST /v1/models/pull

POST /v1/models/warmup

Inference

Response schema (mata.v1)

POST /v1/infer

Inference Parameters (params object)

Per-task request examples

POST /v1/infer/upload

Param name migration

Sessions

POST /v1/sessions

DELETE /v1/sessions/{session_id}

Streaming

WS /v1/stream/{session_id}

`mataserver serve`

`mataserver pull`

`mataserver list`

`mataserver show`

`mataserver rm`

`mataserver load`

`mataserver stop`

`mataserver version`

`GET /v1/health`

`GET /v1/models`

`GET /v1/models/{model_id}`

`POST /v1/models/pull`

`POST /v1/models/warmup`

Response schema (`mata.v1`)

`POST /v1/infer`

Inference Parameters (`params` object)

`POST /v1/infer/upload`

`POST /v1/sessions`

`DELETE /v1/sessions/{session_id}`

`WS /v1/stream/{session_id}`