datamata-io · bimantoromaesa · Mar 7, 2026 · Mar 7, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -21,7 +21,7 @@ Circular dependency is intentionally broken by patching `memory._on_evict = load
 
 All services live on `app.state`: `settings`, `registry`, `runtime_manager`, `session_manager`. Access them in route handlers via `request.app.state.<name>`.
 
-**ModelRegistry** maps HuggingFace repo IDs → tasks in a JSON sidecar (`data_dir/model_registry.json`). Models are fetched from the HF cache via `huggingface_hub`; the registry is populated by `POST /v1/models/pull`.
+**ModelRegistry** maps model IDs → `{task, source}` in a JSON sidecar (`data_dir/model_registry.json`). HuggingFace models (`source="hf"`) are fetched via `huggingface_hub`; pip-based OCR backends (`source="pip"`) are installed via `mataserver/core/pip_installer.py`. The registry supports both old flat format (`{"model": "task"}`) and new dict-of-dicts format (`{"model": {"task": "...", "source": "..."}}`), auto-migrating on read. The registry is populated by `POST /v1/models/pull`.
 
 ## Key Conventions
 
@@ -102,4 +102,27 @@ Two-step: `POST /v1/sessions` creates a session → `WS /v1/stream/{session_id}`
 | `mataserver/schemas/requests.py`      | `InferParams`, `to_mata_kwargs()`, `SUPPORTED_TASKS` |
 | `mataserver/core/result_converter.py` | MATA `VisionResult` → `InferResponse` dispatch       |
 | `mataserver/api/deps.py`              | Auth dependencies (HTTP + WebSocket)                 |
-| `mataserver/models/registry.py`       | Persistent HF model ID → task map                    |
+| `mataserver/models/registry.py`       | Persistent model ID → task + source map              |
+| `mataserver/core/backend_catalog.py`  | Static catalog of pip-based OCR backends             |
+| `mataserver/core/pip_installer.py`    | Pip install helper for non-HF backends               |
+
+### Backend Catalog (pip-based backends)
+
+`mataserver/core/backend_catalog.py` is a **static Python catalog** (not JSON/YAML) that maps short backend names to installation metadata. This prevents arbitrary pip installs from user input.
+
+```python
+from mataserver.core.backend_catalog import lookup, is_cataloged, get_source_type
+
+entry = lookup("easyocr")   # CatalogEntry or None
+is_cataloged("easyocr")     # True
+get_source_type("easyocr")  # "pip"
+get_source_type("org/model") # "hf"
+```
+
+Currently cataloged pip backends: `easyocr`, `paddleocr`, `tesseract`.
+
+When adding a new pip backend:
+
+1. Add a `CatalogEntry` to `_CATALOG` in `backend_catalog.py`.
+2. `pull.py` and `mataserver/api/v1/models.py` dispatch automatically.
+3. Register a result converter with `@_register("ocr")` in `result_converter.py` if needed.
diff --git a/Dockerfile b/Dockerfile
@@ -72,4 +72,24 @@ ENV MATA_SERVER_PORT=8110
 ENV MATA_SERVER_DATA_DIR=/var/lib/mataserver
 ENV PYTHONPATH=/usr/local/lib/python3.11/site-packages
 
+# Optional: Pre-install OCR backends into the image at build time.
+# Pre-baking avoids runtime pip installs and removes the need for outbound internet
+# access in the container. Uncomment the backends you need.
+#
+# EasyOCR:
+# RUN pip install --no-cache-dir easyocr
+#
+# PaddleOCR:
+# RUN pip install --no-cache-dir paddlepaddle paddleocr
+#
+# Tesseract (requires system binary + Python binding):
+# RUN apt-get update && apt-get install -y --no-install-recommends tesseract-ocr \
+#     && rm -rf /var/lib/apt/lists/* \
+#     && pip install --no-cache-dir pytesseract
+#
+# After installing pip packages, register each backend so it appears in the registry:
+# RUN mataserver pull easyocr --task ocr
+# RUN mataserver pull paddleocr --task ocr
+# RUN mataserver pull tesseract --task ocr
+
 ENTRYPOINT ["mataserver", "serve"]
diff --git a/README.md b/README.md
@@ -93,16 +93,16 @@ curl http://localhost:8110/v1/health
 
 The `mataserver` console script provides commands for server management and model operations.
 
-| Command                        | Description                                    |
-| ------------------------------ | ---------------------------------------------- |
-| `mataserver serve`             | Start the inference server                     |
-| `mataserver pull <m> --task T` | Download and register a model from HuggingFace |
-| `mataserver list`              | List all registered models (alias: `ls`)       |
-| `mataserver show <m>`          | Show detailed info for a model                 |
-| `mataserver rm <m>`            | Remove a model from the registry               |
-| `mataserver load <m>`          | Preload a model into memory (alias: `warmup`)  |
-| `mataserver stop <m>`          | Unload a model from memory                     |
-| `mataserver version`           | Print version (also: `mataserver -v`)          |
+| Command                        | Description                                                        |
+| ------------------------------ | ------------------------------------------------------------------ |
+| `mataserver serve`             | Start the inference server                                         |
+| `mataserver pull <m> --task T` | Download/install and register a model (HuggingFace or pip backend) |
+| `mataserver list`              | List all registered models (alias: `ls`)                           |
+| `mataserver show <m>`          | Show detailed info for a model                                     |
+| `mataserver rm <m>`            | Remove a model from the registry                                   |
+| `mataserver load <m>`          | Preload a model into memory (alias: `warmup`)                      |
+| `mataserver stop <m>`          | Unload a model from memory                                         |
+| `mataserver version`           | Print version (also: `mataserver -v`)                              |
 
 For full usage details, argument references, and examples, see [docs/api.md](docs/api.md#cli).
 
@@ -167,17 +167,53 @@ curl http://localhost:8110/v1/health
 { "status": "ok", "version": "0.1.0", "gpu_available": false }
 ```
 
-### Pull a model from HuggingFace
+### Pull a model
 
 ```bash
+# HuggingFace model
 curl -X POST http://localhost:8110/v1/models/pull \
   -H "Authorization: Bearer your-api-key" \
   -H "Content-Type: application/json" \
-  -d '{"source": "hf://datamata/rtdetr-l"}'
+  -d '{"model": "datamata/rtdetr-l", "task": "detect"}'
 ```
 
 ```json
-{ "status": "pulling", "model": "datamata/rtdetr-l" }
+{ "status": "pulled", "model": "datamata/rtdetr-l" }
+```
+
+```bash
+# Pip-based OCR backend
+curl -X POST http://localhost:8110/v1/models/pull \
+  -H "Authorization: Bearer your-api-key" \
+  -H "Content-Type: application/json" \
+  -d '{"model": "easyocr", "task": "ocr"}'
+```
+
+Or via the CLI:
+
+```bash
+# HuggingFace Task Detection model (Example: RT-DETR ResNet-18 backbone)
+mataserver pull PekingU/rtdetr_r18vd --task detect
+
+# HuggingFace Task Classification model (Example: ResNet-50)
+mataserver pull microsoft/resnet-50 --task classify
+
+# HuggingFace Task Segmentation model (Example: Mask2Former Swin-Tiny trained on COCO)
+mataserver pull facebook/mask2former-swin-tiny-coco-instance --task segment
+
+# HuggingFace Task Depth model (Example: Depth Anything V2 Small)
+mataserver pull depth-anything/Depth-Anything-V2-Small-hf --task depth
+
+# HuggingFace Task Visual Language Model (VLM)
+mataserver pull Qwen/Qwen3-VL-2B-Instruct --task vlm
+
+# HuggingFace OCR model
+mataserver pull stepfun-ai/GOT-OCR-2.0-hf --task ocr
+
+# Pip-installed OCR backends
+mataserver pull easyocr --task ocr
+mataserver pull paddleocr --task ocr
+mataserver pull tesseract --task ocr  # requires tesseract system binary
 ```
 
 ### Single-shot inference (base64 JSON)