Skip to content

Latest commit

 

History

History
1042 lines (755 loc) · 35.5 KB

File metadata and controls

1042 lines (755 loc) · 35.5 KB

API Reference

All REST endpoints are served under the /v1 prefix.
Interactive OpenAPI docs are available at /docs (Swagger UI) and /redoc.


CLI

MATASERVER ships a mataserver console script with the following subcommands.

mataserver [--version] <COMMAND> [options]
Command Description
serve Start the inference server (Uvicorn)
pull <model> --task T Download a model from HuggingFace and register it
list / ls List all registered models
show <model> Show detailed info for a registered model
rm <model> Remove a model from the registry (does not delete weights)
load <model> / warmup Preload a model into memory (requires running server)
stop <model> Unload a model from memory (requires running server)
version Print the mataserver version and exit

Global flags:

Flag Description
-v / --version Print version and exit

Running mataserver (no subcommand) prints help and exits.


mataserver serve

Start the Uvicorn inference server. Settings are read from environment variables, a .env file, or a YAML config (see configuration).

mataserver serve

mataserver pull

Download a model and register it with the server so it appears in GET /v1/models and can be used for inference. Supports two backend types:

  • HuggingFace models — downloaded via huggingface_hub.snapshot_download() and stored in the standard HF cache (~/.cache/huggingface). These are identified by a org/repo-name slash-separated ID.
  • Pip-based OCR backends — installed via pip into the current Python environment. These are identified by a short backend name (e.g. easyocr, paddleocr, tesseract).
mataserver pull <MODEL_ID> --task <TASK>
Argument Description
MODEL_ID HuggingFace repo ID (org/name) or pip backend name (easyocr, paddleocr, tesseract)
--task Inference task. One of: classify, depth, detect, ocr, pose, segment, track, vlm

After a successful pull the model ID, task, and source type are written to model_registry.json in MATA_SERVER_DATA_DIR.

Examples:

# Object detection (HuggingFace)
mataserver pull facebook/detr-resnet-50 --task=detect

# Image classification (HuggingFace)
mataserver pull google/vit-base-patch16-224 --task=classify

# Depth estimation (HuggingFace)
mataserver pull LiheYoung/depth-anything-base-hf --task=depth

# OCR — HuggingFace models
mataserver pull stepfun-ai/GOT-OCR-2.0-hf --task ocr
mataserver pull microsoft/trocr-base-printed --task ocr

# OCR — pip-based backends
mataserver pull easyocr --task ocr
mataserver pull paddleocr --task ocr
mataserver pull tesseract --task ocr  # also requires the tesseract system binary

Pip OCR backends: easyocr, paddleocr, and tesseract are installed as Python packages into the active virtual environment rather than downloaded from HuggingFace. tesseract additionally requires the tesseract-ocr system binary; if it is not found on PATH a warning is printed but the pull still succeeds. See OCR Backends for details.

Exit codes:

Code Meaning
0 Pull completed successfully
1 Pull failed (network error, unrecognised model, etc.)
2 Argument error (missing or invalid --task)

Tip: You can also pull models via the REST API without stopping the running server — see POST /v1/models/pull.


mataserver list

List all models registered with the server. Reads from model_registry.json in MATA_SERVER_DATA_DIR.

Alias: mataserver ls

mataserver list

Output — tabular display with columns:

Column Description
MODEL HuggingFace repo ID or pip backend name
TASK Inference task (detect, segment, etc.)
SOURCE hf for HuggingFace models, pip for pip-based backends
SIZE (MB) On-disk size in MB from the HuggingFace cache, or if unknown

If no models are registered, prints "No models registered.".

Example:

mataserver list

MODEL                        TASK     SOURCE  SIZE (MB)
--------------------------------------------------------------
facebook/detr-resnet-50      detect   hf          167.3
google/vit-base-patch16-224  classify hf          327.5
easyocr                      ocr      pip             —
tesseract                    ocr      pip             —

Exit codes:

Code Meaning
0 Success

mataserver show

Show detailed information for a registered model.

mataserver show <MODEL_ID>
Argument Description
MODEL_ID HuggingFace repo ID (e.g. facebook/detr-resnet-50)

Output fields:

Field Description
model HuggingFace repo ID or pip backend name
task Registered inference task
source hf (HuggingFace) or pip (pip-based backend)
size On-disk size in MB from HF cache, or (pip models have no cache)
last_accessed Timestamp of last HF cache access, or (pip models)
pip_packages (pip only) Comma-separated list of installed pip packages
installed (pip only) yes / no — whether the package is importable
system_binary (pip only, if applicable) Binary name and whether it was found

Example — HuggingFace model:

mataserver show facebook/detr-resnet-50

  model:         facebook/detr-resnet-50
  task:          detect
  source:        hf
  size:          167.30 MB
  last_accessed: 2026-03-05 14:22:01

Example — pip backend:

mataserver show easyocr

  model:         easyocr
  task:          ocr
  source:        pip
  size:          —
  last_accessed: —
  pip_packages:  easyocr
  installed:     yes

Example — Tesseract (with system binary check):

mataserver show tesseract

  model:         tesseract
  task:          ocr
  source:        pip
  size:          —
  last_accessed: —
  pip_packages:  pytesseract
  installed:     yes
  system_binary: tesseract (yes)

Exit codes:

Code Meaning
0 Success
1 Model not found in registry

mataserver rm

Remove a model from the registry. This only removes the entry from model_registry.json — model weights in the HuggingFace cache are not deleted.

mataserver rm <MODEL_ID>
Argument Description
MODEL_ID HuggingFace repo ID to remove from the registry

Example — HuggingFace model:

mataserver rm facebook/detr-resnet-50
Removed 'facebook/detr-resnet-50' from the registry.
Note: model weights on disk (HF cache) were not deleted.

Example — pip backend:

mataserver rm easyocr
Removed 'easyocr' from the registry.
Note: pip packages were not uninstalled. Remove manually if needed.

Exit codes:

Code Meaning
0 Removed successfully
1 Model not found in registry

Note: The load and stop subcommands communicate with a running MATASERVER instance over HTTP. They do not work offline. Use --url to specify the server address if it differs from the default.

mataserver load

Preload a model into memory without running inference. Useful for eliminating cold-start latency. Sends POST /v1/models/warmup to the running server.

Alias: mataserver warmup

mataserver load <MODEL_ID> [--url URL] [--api-key KEY]
Argument / Option Description
MODEL_ID HuggingFace repo ID to warm up
--url URL Server base URL (default: http://localhost:<MATA_SERVER_PORT> from settings)
--api-key KEY API key for auth (default: MATA_SERVER_API_KEY environment variable)

URL resolution: When --url is not provided, the CLI reads MATA_SERVER_HOST and MATA_SERVER_PORT from settings. If host is 0.0.0.0, localhost is used instead.

Example:

# Default server address
mataserver load facebook/detr-resnet-50

# Explicit server address and API key
mataserver load facebook/detr-resnet-50 --url http://192.168.1.10:8110 --api-key my-secret

# Using the alias
mataserver warmup facebook/detr-resnet-50

Exit codes:

Code Meaning
0 Model loaded successfully
1 Connection error or HTTP error from the server

mataserver stop

Unload a model from memory. Sends POST /v1/models/unload to the running server.

mataserver stop <MODEL_ID> [--url URL] [--api-key KEY]
Argument / Option Description
MODEL_ID HuggingFace repo ID to unload
--url URL Server base URL (default: http://localhost:<MATA_SERVER_PORT> from settings)
--api-key KEY API key for auth (default: MATA_SERVER_API_KEY environment variable)

Example:

mataserver stop facebook/detr-resnet-50
Model 'facebook/detr-resnet-50' unloaded from memory.

If the model was not loaded, the output is:

Model 'facebook/detr-resnet-50' was not loaded (nothing to stop).

Exit codes:

Code Meaning
0 Model unloaded (or was not loaded)
1 Connection error or HTTP error from the server

mataserver version

Print the mataserver version and exit. Equivalent to mataserver -v or mataserver --version.

mataserver version
mataserver 0.6.0

OCR Backends

The ocr task supports two categories of model backend:

HuggingFace OCR models

These are pulled like any other model — weights are downloaded into the HuggingFace cache and source is recorded as "hf":

Model ID Notes
stepfun-ai/GOT-OCR-2.0-hf GOT-OCR2 — general-purpose OCR
microsoft/trocr-base-printed TrOCR — optimised for printed text
microsoft/trocr-base-handwritten TrOCR — optimised for handwritten text

Pip-based OCR backends

These are installed as Python packages (and optionally require a system binary). source is recorded as "pip". They do not occupy space in the HuggingFace cache.

Backend name Pip packages installed System binary required Notes
easyocr easyocr None Supports 80+ languages
paddleocr paddlepaddle, paddleocr None High accuracy; larger install
tesseract pytesseract tesseract-ocr Requires system binary (see note below)

Tesseract system binary: mataserver pull tesseract --task ocr installs the pytesseract Python wrapper but not the tesseract-ocr binary itself. Install it separately:

  • Debian/Ubuntu: apt-get install -y tesseract-ocr
  • macOS: brew install tesseract
  • Windows: Download from UB-Mannheim/tesseract

If the binary is not found at pull time, a warning is printed but the backend is still registered.

Checking backend status

mataserver show easyocr
  source:        pip
  pip_packages:  easyocr
  installed:     yes

mataserver show tesseract
  source:        pip
  pip_packages:  pytesseract
  installed:     yes
  system_binary: tesseract (yes)

Removing a pip backend

mataserver rm <backend> removes the registration entry only. The pip packages are not uninstalled automatically; remove them manually with pip uninstall <package> if desired.


Authentication

Most endpoints require a Bearer token in the Authorization header.

Authorization: Bearer <api_key>

Authentication behaviour is controlled by MATA_SERVER_AUTH_MODE:

Mode Behaviour
api_key Token required; 401 if missing, 403 if invalid
none All requests accepted without a token (dev/test use)

Error responses for auth failures

401 Unauthorized — missing Authorization header:

{ "detail": "Missing Authorization header" }

403 Forbidden — token not in the allowed list:

{ "detail": "Invalid API key" }

Common error format

All error responses follow the standard FastAPI detail structure:

{ "detail": "<human-readable message>" }

HTTP status codes used across the API:

Code Meaning
400 Bad request (invalid input, decode error)
401 Unauthorized — missing credentials
403 Forbidden — invalid credentials
404 Resource not found
409 Conflict (e.g. pull already in progress)
500 Internal server error
503 Service unavailable (session limit exceeded)
507 Insufficient storage (out of VRAM/RAM)

Health

GET /v1/health

Returns server status. No authentication required.

Response 200 OK:

{
  "status": "ok",
  "version": "0.1.0",
  "gpu_available": true
}
Field Type Description
status string Always "ok" when the server is up
version string MATASERVER release version
gpu_available boolean true if a CUDA-capable GPU is present

Example:

curl http://localhost:8110/v1/health

Models

All model endpoints require authentication.

GET /v1/models

List all models that are currently installed (i.e. present in the HuggingFace cache and registered with the server).

Response 200 OK — array of ModelInfo objects:

[
  {
    "model": "PekingU/rtdetr_v2_r101vd",
    "task": "detect",
    "source": "hf",
    "state": "idle",
    "size_mb": 421.0,
    "memory_mb": 512.0,
    "loaded_at": 1709550000.0,
    "last_used": 1709553600.0
  },
  {
    "model": "easyocr",
    "task": "ocr",
    "source": "pip",
    "state": "unloaded",
    "size_mb": null,
    "memory_mb": null,
    "loaded_at": null,
    "last_used": null
  }
]
Field Type Description
model string HuggingFace repo ID or pip backend name
task string Inference task (detect, segment, classify, …)
source string "hf" for HuggingFace models, "pip" for pip-based backends
state string Current lifecycle state (see table below)
size_mb number | null On-disk size in MB from the HF cache; null for pip backends
memory_mb number | null Allocated memory in MB (null when unloaded)
loaded_at number | null Unix timestamp of when the model was loaded
last_used number | null Unix timestamp of the most recent inference call

Model state values:

State Meaning
unloaded Installed but not in memory
loading Currently loading weights
ready In memory and ready to serve requests
active Currently running an inference
idle In memory but idle (eligible for keep-alive expiry)
evicted Was in memory but was evicted to free resources

Example:

curl -H "Authorization: Bearer $KEY" http://localhost:8110/v1/models

GET /v1/models/{model_id}

Retrieve full details for a specific model.

model_id is the HuggingFace repo ID, e.g. PekingU/rtdetr_v2_r101vd.

Response 200 OKModelInfo object (same shape as the list items above).

Response 404 Not Found:

{ "detail": "Model not found: PekingU/rtdetr_v2_r101vd" }

Example:

curl -H "Authorization: Bearer $KEY" \
     http://localhost:8110/v1/models/PekingU/rtdetr_v2_r101vd

POST /v1/models/pull

Download or install a model and register it with the server. Supports both HuggingFace models (downloaded into ~/.cache/huggingface) and pip-based OCR backends (installed into the active Python environment). The 202 response is returned once the operation completes.

Request body:

{
  "model": "PekingU/rtdetr_v2_r101vd",
  "task": "detect"
}
Field Type Description
model string HuggingFace repo ID ("org/model-name") or pip backend name ("easyocr", …)
task string Inference task (detect, segment, ocr, …)

Response 202 Accepted:

{ "status": "pulled", "model": "PekingU/rtdetr_v2_r101vd" }

Error responses:

Code Condition
400 Pull failed (network error, model not found, task mismatch, pip install error)
409 A pull for the same model is already in progress
500 Unexpected server error

Examples:

# HuggingFace model
curl -X POST http://localhost:8110/v1/models/pull \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "PekingU/rtdetr_v2_r101vd", "task": "detect"}'

# Pip-based OCR backend
curl -X POST http://localhost:8110/v1/models/pull \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "easyocr", "task": "ocr"}'

POST /v1/models/warmup

Pre-load a model into memory without running inference. Useful for eliminating cold-start latency before the first request.

Request body:

{ "model": "datamata/rtdetr-l" }

Response 200 OK:

{ "status": "ready", "model": "datamata/rtdetr-l" }

Error responses:

Code Condition
404 Model ref not found in registry
507 Insufficient VRAM/RAM to load the model
500 Unexpected server error

Example:

curl -X POST http://localhost:8110/v1/models/warmup \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "datamata/rtdetr-l"}'

Inference

All inference endpoints require authentication. The server auto-loads the requested model on the first call if it is not already in memory.

Response schema (mata.v1)

All inference endpoints return the same mata.v1 schema:

{
  "schema_version": "mata.v1",
  "task": "detect",
  "model": "datamata/rtdetr-l",
  "timestamp": 1709553600.123,
  "detections": [
    {
      "label": "person",
      "confidence": 0.94,
      "bbox": [120.5, 45.2, 380.1, 520.8]
    }
  ]
}

Only the task-relevant field is populated in each response:

Task Populated field Type
detect detections Detection[]
segment segments Segment[]
classify classifications Classification[]
pose keypoints Keypoint[]
ocr text, regions string, TextRegion[]
depth depth_map base64 float32 array
zero-shot classifications Classification[]

Detection:

{ "label": "car", "confidence": 0.87, "bbox": [x1, y1, x2, y2] }

Segment (same as Detection plus optional mask):

{ "label": "car", "confidence": 0.87, "bbox": [x1, y1, x2, y2], "mask": "<base64>" }

Classification:

{ "label": "tabby cat", "confidence": 0.92 }

Keypoint:

{ "label": "person", "confidence": 0.88, "bbox": [x1, y1, x2, y2], "keypoints": [[x, y, conf], ...] }

TextRegion:

{ "text": "Hello world", "bbox": [x1, y1, x2, y2], "confidence": 0.99 }

POST /v1/infer

Run single-shot inference with a JSON body containing a base64-encoded image.

Request body:

{
  "model": "datamata/rtdetr-l",
  "image": "<base64-encoded image bytes>",
  "params": {
    "confidence": 0.4
  }
}
Field Type Required Description
model string Model reference (must be installed)
image string Base64-encoded image (JPEG or PNG recommended)
params object Typed inference parameters (see table below)

Inference Parameters (params object)

All fields are optional. Only send the parameters relevant to your model's task.

Field Type Tasks Description MATA native name
confidence float [0.0–1.0] detect, segment, classify, pose Confidence threshold threshold
prompts string | string[] zero-shot detect/segment/classify, SAM3 Text prompts for zero-shot models text_prompts
prompt string vlm Text question about the image prompt
system_prompt string vlm System prompt override system_prompt
max_tokens int (>0) vlm, ocr (HF) Maximum tokens to generate max_new_tokens
temperature float (≥0) vlm Sampling temperature (0 = greedy) temperature
top_p float [0.0–1.0] vlm Nucleus sampling probability top_p
top_k int (>0) classify, vlm Top-K predictions / sampling top_k
output_mode string vlm Output mode: json, detect, classify, describe output_mode
use_softmax bool classify (CLIP) Apply softmax normalization use_softmax
language string ocr (Tesseract, Paddle) OCR language code override lang
ocr_type string ocr (GOT-OCR2) ocr (plain text) or format (markdown/LaTeX) ocr_type
detail int (0 or 1) ocr (EasyOCR) 0 = text-only, 1 = full with bboxes detail
normalize bool depth Normalize depth map values normalize
target_size [int, int] depth Output depth map size [height, width] target_size
point_prompts [[x,y,label],...] SAM Point prompts (label: 1=foreground, 0=background) point_prompts
box_prompts [[x1,y1,x2,y2],...] SAM Bounding box prompts box_prompts
detection_confidence float [0.0–1.0] pipeline Detection threshold for GroundingDINO+SAM pipeline detection_threshold
segmentation_confidence float [0.0–1.0] pipeline Segmentation threshold for GroundingDINO+SAM pipeline segmentation_threshold

Migration note: The params object previously accepted raw MATA-native names (threshold, text_prompts, max_new_tokens, lang, detection_threshold, segmentation_threshold). These are now rejected with 422 Unprocessable Entity. Use the user-friendly names from the table above. See the full mapping below.

Per-task request examples

Object Detection:

{
  "model": "datamata/rtdetr-l",
  "image": "<base64>",
  "params": { "confidence": 0.4 }
}

Zero-shot Detection:

{
  "model": "datamata/grounding-dino",
  "image": "<base64>",
  "params": { "prompts": "cat . dog . person", "confidence": 0.3 }
}

Classification:

{ "model": "datamata/resnet50", "image": "<base64>", "params": { "top_k": 5 } }

Zero-shot Classification (CLIP):

{
  "model": "datamata/clip-vit",
  "image": "<base64>",
  "params": { "prompts": ["cat", "dog", "bird"], "top_k": 3 }
}

VLM:

{
  "model": "datamata/llava-1.5",
  "image": "<base64>",
  "params": {
    "prompt": "What objects are in this image?",
    "max_tokens": 256,
    "temperature": 0.3
  }
}

OCR:

{
  "model": "datamata/easyocr",
  "image": "<base64>",
  "params": { "language": "jpn", "detail": 1 }
}

Depth:

{
  "model": "datamata/depth-anything",
  "image": "<base64>",
  "params": { "target_size": [480, 640], "normalize": true }
}

Segmentation (SAM):

{
  "model": "datamata/sam2",
  "image": "<base64>",
  "params": { "prompts": "cat", "confidence": 0.5 }
}

Response 200 OKmata.v1 InferResponse (see schema above).

Error responses:

Code Condition
400 Missing image field or invalid base64
422 Invalid or unknown params field
404 Model ref not found / not installed
500 Engine or runtime error

Example:

IMAGE_B64=$(base64 -w0 photo.jpg)
curl -X POST http://localhost:8110/v1/infer \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d "{\"model\": \"datamata/rtdetr-l\", \"image\": \"$IMAGE_B64\", \"params\": {\"confidence\": 0.4}}"

POST /v1/infer/upload

Run single-shot inference with a multipart form upload. Convenient when sending images directly from disk or a webcam capture without pre-encoding.

Requestmultipart/form-data:

Field Type Required Description
model string Model reference
file file Image file (JPEG, PNG, etc.)
confidence float Confidence threshold [0.0–1.0]
prompts string Text prompts for zero-shot (dot-separated)
prompt string Text question for VLM
language string OCR language code
top_k int Number of top predictions
max_tokens int Max tokens to generate
temperature float Sampling temperature (VLM)
output_mode string VLM output mode
ocr_type string OCR output type
detail int OCR detail level (0 or 1)

Complex params (point_prompts, box_prompts, target_size, normalize, use_softmax, system_prompt, top_p, detection_confidence, segmentation_confidence) are JSON-endpoint-only and cannot be passed via multipart upload.

Response 200 OKmata.v1 InferResponse (see schema above).

Error responses:

Code Condition
400 Uploaded file is empty
422 Invalid form field value
404 Model ref not found
500 Engine or runtime error

Example:

curl -X POST http://localhost:8110/v1/infer/upload \
  -H "Authorization: Bearer $KEY" \
  -F "model=datamata/rtdetr-l" \
  -F "confidence=0.4" \
  -F "file=@photo.jpg"

Param name migration

The params object uses user-friendly names that differ from the MATA adapter's internal kwarg names. Sending the old MATA-native names directly will return 422 Unprocessable Entity.

User-friendly name (API) MATA native name (internal)
confidence threshold
prompts text_prompts
max_tokens max_new_tokens
language lang
detection_confidence detection_threshold
segmentation_confidence segmentation_threshold

All other fields (prompt, system_prompt, temperature, top_p, top_k, output_mode, use_softmax, ocr_type, detail, normalize, target_size, point_prompts, box_prompts) pass through unchanged.


Sessions

Sessions are used together with the WebSocket streaming endpoint (/v1/stream/{session_id}).
All session endpoints require authentication.

POST /v1/sessions

Create a new streaming session. The server ensures the requested model is loaded before returning.

Request body:

{
  "model": "datamata/rtdetr-l",
  "task": "detect",
  "params": { "confidence": 0.4 }
}
Field Type Required Description
model string Model reference
task string | null Inference task override
params object Parameters forwarded to adapter.predict() on each frame

Response 201 Created:

{
  "session_id": "sess_3f8a1b2c9d4e",
  "ws_url": "ws://localhost:8110/v1/stream/sess_3f8a1b2c9d4e"
}

The ws_url uses wss:// when the server is reached over HTTPS.

Error responses:

Code Condition
404 Model ref not found
503 Maximum concurrent sessions reached (limit: 10)
507 Insufficient memory to load the model

Example:

curl -X POST http://localhost:8110/v1/sessions \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "datamata/rtdetr-l", "task": "detect", "params": {"confidence": 0.5}}'

DELETE /v1/sessions/{session_id}

Close and clean up a streaming session. If the WebSocket is still connected it will be disconnected.

Response 204 No Content — on success (empty body).

Response 404 Not Found:

{ "detail": "Session not found: sess_3f8a1b2c9d4e" }

Example:

curl -X DELETE http://localhost:8110/v1/sessions/sess_3f8a1b2c9d4e \
  -H "Authorization: Bearer $KEY"

Streaming

See streaming.md for the full WebSocket protocol specification.

WS /v1/stream/{session_id}

WebSocket endpoint for real-time frame-by-frame inference. Clients send binary frames and receive JSON inference results. Requires a session created via POST /v1/sessions.

Query parameters:

Parameter Required Description
token ✅* API key (required when auth_mode = api_key)

WebSocket close codes:

Code Meaning
4001 Unauthorized — bad or missing ?token=
4004 Invalid or expired session ID
1011 Internal server error during inference

Example connection URL:

ws://localhost:8110/v1/stream/sess_3f8a1b2c9d4e?token=my-api-key