Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,18 @@ asyncio.run(stream())

See [docs/streaming.md](docs/streaming.md) for the full binary frame protocol specification and a complete async client example.

### Runnable example scripts

The [`examples/`](examples/) directory contains ready-to-run Python clients:

| Script | Description |
| ---------------------------------------------------------- | -------------------------------------------------- |
| [`examples/rest_infer.py`](examples/rest_infer.py) | REST inference — detect, classify, segment |
| [`examples/rest_vlm.py`](examples/rest_vlm.py) | REST inference — visual language model (VLM) |
| [`examples/ws_video_infer.py`](examples/ws_video_infer.py) | WebSocket video streaming — frame-by-frame results |

See [`examples/README.md`](examples/README.md) for full usage, argument reference, and sample output for each script.

---

## Development Setup
Expand Down
257 changes: 257 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
# Examples

This directory contains runnable client examples for MATA-SERVER.

| Script | Transport | Tasks covered |
| ---------------------------------------- | -------------------------------- | -------------------------------------- |
| [`rest_infer.py`](#rest_inferpy) | REST (`POST /v1/infer`) | `detect`, `classify`, `segment` |
| [`rest_vlm.py`](#rest_vlmpy) | REST (`POST /v1/infer`) | `vlm` |
| [`ws_video_infer.py`](#ws_video_inferpy) | WebSocket (`WS /v1/stream/{id}`) | `detect`, `segment`, `classify`, `vlm` |

> **Prerequisites** — a MATA-SERVER instance must be reachable before running any example.
> Start one locally with:
>
> ```bash
> MATA_SERVER_AUTH_MODE=none mataserver serve
> ```
>
> All examples default to `127.0.0.1:8110`. Pass `--host` / `--port` to override.

---

## rest_infer.py

Runs single-shot inference against the REST API using a **base64-encoded image** payload.
Covers the three classic vision tasks — object detection, image classification, and instance segmentation.

### Requirements

```bash
pip install requests
```

### Usage

```bash
# Run all three tasks with the default model set
python examples/rest_infer.py --image examples/images/coco_cat_remote.jpg

# Run a single task
python examples/rest_infer.py \
--image examples/images/coco_cat_remote.jpg \
--task detect \
--model PekingU/rtdetr_r18vd

# Zero-shot open-vocabulary detection with text prompts
python examples/rest_infer.py \
--image examples/images/coco_cat_remote.jpg \
--task detect \
--model google/owlv2-base-patch16-ensemble \
--prompts "cat,dog,remote control"
```

### Arguments

| Argument | Default | Description |
| ----------- | ------------ | -------------------------------------------------------------------- |
| `--image` | _(required)_ | Path to an image file |
| `--task` | all three | One of `detect`, `classify`, `segment` |
| `--model` | task default | HuggingFace model repo ID; defaults to the bundled task-to-model map |
| `--prompts` | — | Comma-separated text prompts for zero-shot / open-vocabulary models |
| `--host` | `127.0.0.1` | Server hostname |
| `--port` | `8110` | Server port |

### Default models

| Task | Default model |
| ---------- | ---------------------------------------------- |
| `detect` | `PekingU/rtdetr_r18vd` |
| `classify` | `google/vit-base-patch16-224` |
| `segment` | `facebook/mask2former-swin-tiny-coco-instance` |

### Sample output

```
--- Task: DETECT | Model: PekingU/rtdetr_r18vd ---
3 detection(s)
[0.94] cat bbox=[42, 10, 380, 470]
[0.81] remote bbox=[200, 300, 260, 420]
[0.57] couch bbox=[0, 250, 480, 480]
Full response keys: ['schema_version', 'task', 'model', 'timestamp', 'detections']

--- Task: CLASSIFY | Model: google/vit-base-patch16-224 ---
5 class(es)
[0.84] tabby cat
[0.07] Egyptian cat
...

--- Task: SEGMENT | Model: facebook/mask2former-swin-tiny-coco-instance ---
4 segment(s)
[0.91] cat bbox=[42, 10, 380, 470]
...
```

---

## rest_vlm.py

Sends an image and a natural-language prompt to a **Visual Language Model** (VLM) via the REST API and prints the generated response text.

### Requirements

```bash
pip install requests
```

### Usage

```bash
# Basic question about an image
python examples/rest_vlm.py \
--image examples/images/coco_cat_remote.jpg \
--prompt "What do you see in this image?"

# Control generation parameters
python examples/rest_vlm.py \
--image examples/images/coco_cat_remote.jpg \
--prompt "List every object you can identify." \
--max-tokens 256 \
--temperature 0.3

# Use a different VLM
python examples/rest_vlm.py \
--image examples/images/coco_cat_remote.jpg \
--prompt "Describe the scene in one sentence." \
--model Qwen/Qwen2.5-VL-7B-Instruct
```

### Arguments

| Argument | Default | Description |
| --------------- | ----------------------------- | ------------------------------------------------------------- |
| `--image` | _(required)_ | Path to an image file |
| `--prompt` | `"Describe this image."` | Natural-language question or instruction |
| `--model` | `Qwen/Qwen2.5-VL-3B-Instruct` | VLM model repo ID |
| `--max-tokens` | — | Maximum number of tokens to generate |
| `--temperature` | — | Sampling temperature (`0.0` = greedy, higher = more creative) |
| `--host` | `127.0.0.1` | Server hostname |
| `--port` | `8110` | Server port |

### Sample output

```
--- VLM Inference ---
Model : Qwen/Qwen2.5-VL-3B-Instruct
Prompt : 'What do you see in this image?'

Response:
The image shows a cat sitting on a couch next to a remote control. The cat appears to be relaxed and is looking towards the camera.
```

---

## ws_video_infer.py

Streams a local video file to MATA-SERVER over a **WebSocket connection** and prints inference results frame-by-frame as they arrive.
Implements the full session lifecycle:

1. `POST /v1/sessions` — create a streaming session and receive a `session_id`
2. `WS /v1/stream/{session_id}` — connect and stream binary-encoded frames
3. `DELETE /v1/sessions/{session_id}` — clean up after streaming ends

Frames are encoded using the MATA binary wire format: a 13-byte header (`frame_id` uint32 BE + `timestamp` float64 BE + `encoding` uint8) followed by JPEG bytes.

### Requirements

```bash
pip install aiohttp opencv-python
```

### Usage

```bash
# Object detection on a video
python examples/ws_video_infer.py \
--video examples/videos/cup.mp4 \
--task detect

# Limit to first 60 frames and cap the send rate
python examples/ws_video_infer.py \
--video examples/videos/cup.mp4 \
--task detect \
--max-frames 60 \
--fps-limit 15

# Use the "latest" frame policy — server always processes the newest frame
# (drops intermediate frames when inference is slower than send rate)
python examples/ws_video_infer.py \
--video examples/videos/cup.mp4 \
--model PekingU/rtdetr_r18vd \
--task detect \
--frame-policy latest

# Authenticated server
python examples/ws_video_infer.py \
--video examples/videos/cup.mp4 \
--task detect \
--api-key my-secret-key
```

### Arguments

| Argument | Default | Description |
| ---------------- | ---------------------- | ---------------------------------------------------------------------- |
| `--video` | _(required)_ | Path to a video file (`mp4`, `avi`, etc.) |
| `--task` | _(required)_ | Inference task: `detect`, `segment`, `classify`, `vlm`, etc. |
| `--model` | `PekingU/rtdetr_r18vd` | HuggingFace model repo ID |
| `--max-frames` | `0` (all) | Maximum frames to send; `0` = entire video |
| `--fps-limit` | `0` (native) | Cap send rate in frames per second; `0` = no limit |
| `--frame-policy` | `queue` | `queue` (process every frame in order) or `latest` (skip stale frames) |
| `--api-key` | — | Bearer token for authenticated servers |
| `--host` | `127.0.0.1` | Server hostname |
| `--port` | `8110` | Server port |

### Frame policies

| Policy | Behaviour | Best for |
| -------- | ------------------------------------------------------------------------------------------ | ----------------------------------------- |
| `queue` | Every frame is queued and processed in order. No frames are dropped. | Offline analysis, accuracy-critical tasks |
| `latest` | When the server is busy, older queued frames are dropped and only the most recent is kept. | Real-time / live-stream scenarios |

### Sample output

```
[1/3] Creating session model='PekingU/rtdetr_r18vd' task='detect' frame_policy='queue'
session_id : sess_a1b2c3d4e5f6

[2/3] Streaming 120/120 frames @ 30.0 fps
[frame 0] 2 detections
[frame 1] 2 detections
[frame 2] 3 detections
...
Sent 120 frames. Waiting for results…

Sent : 120 frames in 4.01s (29.9 fps)
Received: 118 results | 0 dropped | 0 errors

[3/3] Deleting session sess_a1b2c3d4e5f6
Session deleted (204)
```

---

## Sample assets

| File | Description |
| ---------------------------- | --------------------------------------------------------------------------------------- |
| `images/coco_cat_remote.jpg` | COCO-style photo with a cat and a TV remote — used by `rest_infer.py` and `rest_vlm.py` |
| `videos/cup.mp4` | Short clip of a cup — used by `ws_video_infer.py` |

---

## Further reading

- [API reference](../docs/api.md) — full endpoint specs and request/response schemas
- [Streaming protocol](../docs/streaming.md) — binary frame format and WebSocket lifecycle
- [Deployment guide](../docs/deployment.md) — Docker, GPU, and production configuration
- [Root README](../README.md) — project overview, quick-start, and CLI reference
5 changes: 5 additions & 0 deletions mataserver/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,11 @@ def create_app() -> FastAPI:

app.include_router(api_router, prefix="/v1")

@app.get("/", tags=["root"], include_in_schema=False)
async def root() -> dict[str, str]:
"""Minimal liveness check — intentionally lightweight, no auth required."""
return {"status": "running", "message": "mataserver is running"}

return app


Expand Down
Loading