ML Model Serving Platform on Kubernetes
A platform to register ML models, auto-containerize them, and serve them behind versioned, autoscaled endpoints — with a CLI, API gateway, live dashboard, and per-model metrics.
Tech stack: Python, FastAPI, Docker, Kubernetes, Jinja2, Click, Pydantic
Only prerequisite: Docker Desktop
git clone https://github.com/antmlap/modelfleet.git
cd modelfleet
./run.shThis builds all containers, starts the platform, and opens the dashboard at http://localhost:8080.
╔══════════════════════════════════════════════╗
║ ModelFleet is running! ║
╚══════════════════════════════════════════════╝
Dashboard: http://localhost:8080
API Docs: http://localhost:8080/docs
To stop: docker compose down
- Model Registry — auto-discovers models from a
models/directory, validates the handler interface - Auto-containerization — generates Dockerfiles from templates, builds tagged images per model
- API Gateway — routes
POST /models/{name}/{version}/predictto the correct backend container - Live Dashboard — shows deployed models, request metrics, latency, error rates, and lets you test predictions in-browser
- Per-model Metrics — tracks request count, avg latency, and error rate per model/version
- Kubernetes Deployment — Jinja2-templated Deployments, Services, and HPAs with rolling updates
- CLI Tool —
modelfleet build,deploy,status,delete,metrics
# Sentiment analysis (HuggingFace DistilBERT)
curl -X POST http://localhost:8080/models/sentiment/v1/predict \
-H "Content-Type: application/json" \
-d '{"input_data": {"text": "I love this product!"}}'{
"result": { "label": "positive", "confidence": 0.9998, "text": "I love this product!" },
"latency_ms": 42.5,
"model_name": "sentiment",
"model_version": "v1"
}# Price movement classifier (scikit-learn)
curl -X POST http://localhost:8080/models/price_classifier/v1/predict \
-H "Content-Type: application/json" \
-d '{"input_data": {"prices": [100.0, 101.5, 99.8, 102.3, 105.0]}}'{
"result": { "prediction": "UP", "pct_change": 5.0, "prices_received": 5 },
"latency_ms": 0.06,
"model_name": "price_classifier",
"model_version": "v1"
} Client
│
▼
┌─────────────────────────────────────┐
│ ModelFleet API Gateway │
│ POST /models/{name}/{ver}/predict │
│ + Dashboard UI + Metrics │
└──────────────┬──────────────────────┘
│ routes by model + version
┌───────┼────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│Model A │ │Model B │ │Model C │
│ Server │ │ Server │ │ Server │
└────────┘ └────────┘ └────────┘
Each container runs:
FastAPI /predict + /health + /metrics
+ model handler code
+ model weights / dependencies
Two deployment modes:
| Mode | Command | Use Case |
|---|---|---|
| Docker Compose | ./run.sh |
Local demo, portfolio review |
| Kubernetes | modelfleet deploy |
Full K8s with autoscaling, HPA, rolling updates |
- Create a folder under
models/with ahandler.pyandrequirements.txt:
class ModelHandler:
def __init__(self):
"""Load model weights. Runs once at container startup."""
self.model = load_my_model()
def predict(self, input_data: dict) -> dict:
"""Run inference and return results."""
return {"prediction": self.model.predict(input_data)}- Add it to
config/routes.yamlanddocker-compose.yml, then./run.sh.
For Kubernetes: modelfleet build my_model -v v1 && modelfleet deploy my_model -v v1
For the full K8s experience with autoscaling, rolling updates, and HPA:
pip install -e .
k3d cluster create modelfleet --port "8080:80@loadbalancer" --agents 2
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/gateway-configmap.yaml
modelfleet build sentiment -v v1
modelfleet deploy sentiment -v v1
modelfleet status| Command | Description |
|---|---|
modelfleet list |
List all registered models |
modelfleet build <model> -v <ver> |
Build a Docker image |
modelfleet deploy <model> -v <ver> |
Deploy to K8s cluster |
modelfleet status [model] |
Show deployment status |
modelfleet delete <model> -v <ver> |
Remove a deployment |
modelfleet metrics |
Per-model request metrics |
ModelFleet/
├── run.sh ← One-command launcher
├── docker-compose.yml ← Full stack definition
├── Dockerfile.model ← Generic model image builder
│
├── modelfleet/ ← Python package (CLI + core logic)
│ ├── cli.py ← Click CLI: build, deploy, list, status, delete, metrics
│ ├── builder.py ← Docker image builder (Jinja2 templates → docker build)
│ ├── deployer.py ← K8s deployer (renders + applies manifests via kubectl)
│ ├── registry.py ← Model discovery & handler validation
│ └── config.py ← Shared constants and paths
│
├── gateway/ ← API Gateway + Dashboard
│ ├── main.py ← FastAPI: routing, metrics, static file serving
│ ├── metrics.py ← Per-model request/latency/error tracking
│ ├── static/index.html ← Dashboard UI
│ └── Dockerfile
│
├── server/ ← Generic model server (baked into each model image)
│ └── serve.py ← FastAPI: /predict, /health, /metrics + Pydantic validation
│
├── models/ ← Model definitions
│ ├── sentiment/ ← DistilBERT sentiment classifier (HuggingFace Transformers)
│ └── price_classifier/ ← Price movement predictor (scikit-learn / rule-based)
│
├── templates/ ← Jinja2 templates for K8s manifests
│ ├── deployment.yaml.j2
│ ├── service.yaml.j2
│ └── hpa.yaml.j2
│
├── k8s/ ← Static K8s manifests (namespace, gateway, configmap, ingress)
├── config/ ← Docker Compose routing config
└── tests/ ← 19 pytest tests (registry, builder, CLI, handler validation)
| Decision | Choice | Rationale |
|---|---|---|
| Container per model | Each model gets its own Docker image | Dependency isolation — PyTorch model doesn't need sklearn |
| FastAPI | Over Flask/gRPC | Async, auto OpenAPI docs, Pydantic validation, fast |
| Centralized gateway | Single entry point | Unified routing, metrics, auth surface |
| Two deploy modes | Docker Compose + K8s | Easy demo for reviewers + production-grade for clusters |
| ConfigMap routing | YAML, not a database | Simple, K8s-native, zero extra infrastructure |
| HPA on CPU | Not custom metrics | Works out of the box; Prometheus is a stretch goal |
| Rolling updates | maxSurge=1, maxUnavailable=0 | Zero-downtime version bumps |
| Jinja2 templates | Not Helm (initially) | Lower complexity, easier to understand and extend |
pip install -e .
pytest tests/ -v19 tests covering model registry, builder, CLI commands, and handler validation.
MIT