Skip to content

antmlap/ModelFleet

Repository files navigation

ModelFleet

ML Model Serving Platform on Kubernetes

A platform to register ML models, auto-containerize them, and serve them behind versioned, autoscaled endpoints — with a CLI, API gateway, live dashboard, and per-model metrics.

Tech stack: Python, FastAPI, Docker, Kubernetes, Jinja2, Click, Pydantic


Quick Start (One Command)

Only prerequisite: Docker Desktop

git clone https://github.com/antmlap/modelfleet.git
cd modelfleet
./run.sh

This builds all containers, starts the platform, and opens the dashboard at http://localhost:8080.

  ╔══════════════════════════════════════════════╗
  ║   ModelFleet is running!                     ║
  ╚══════════════════════════════════════════════╝

  Dashboard:  http://localhost:8080
  API Docs:   http://localhost:8080/docs

To stop: docker compose down


What It Does

  • Model Registry — auto-discovers models from a models/ directory, validates the handler interface
  • Auto-containerization — generates Dockerfiles from templates, builds tagged images per model
  • API Gateway — routes POST /models/{name}/{version}/predict to the correct backend container
  • Live Dashboard — shows deployed models, request metrics, latency, error rates, and lets you test predictions in-browser
  • Per-model Metrics — tracks request count, avg latency, and error rate per model/version
  • Kubernetes Deployment — Jinja2-templated Deployments, Services, and HPAs with rolling updates
  • CLI Toolmodelfleet build, deploy, status, delete, metrics

Example Predictions

# Sentiment analysis (HuggingFace DistilBERT)
curl -X POST http://localhost:8080/models/sentiment/v1/predict \
  -H "Content-Type: application/json" \
  -d '{"input_data": {"text": "I love this product!"}}'
{
  "result": { "label": "positive", "confidence": 0.9998, "text": "I love this product!" },
  "latency_ms": 42.5,
  "model_name": "sentiment",
  "model_version": "v1"
}
# Price movement classifier (scikit-learn)
curl -X POST http://localhost:8080/models/price_classifier/v1/predict \
  -H "Content-Type: application/json" \
  -d '{"input_data": {"prices": [100.0, 101.5, 99.8, 102.3, 105.0]}}'
{
  "result": { "prediction": "UP", "pct_change": 5.0, "prices_received": 5 },
  "latency_ms": 0.06,
  "model_name": "price_classifier",
  "model_version": "v1"
}

Architecture

  Client
    │
    ▼
┌─────────────────────────────────────┐
│       ModelFleet API Gateway        │
│  POST /models/{name}/{ver}/predict  │
│  + Dashboard UI + Metrics           │
└──────────────┬──────────────────────┘
               │  routes by model + version
       ┌───────┼────────┐
       ▼       ▼        ▼
   ┌────────┐ ┌────────┐ ┌────────┐
   │Model A │ │Model B │ │Model C │
   │ Server │ │ Server │ │ Server │
   └────────┘ └────────┘ └────────┘

   Each container runs:
     FastAPI /predict + /health + /metrics
     + model handler code
     + model weights / dependencies

Two deployment modes:

Mode Command Use Case
Docker Compose ./run.sh Local demo, portfolio review
Kubernetes modelfleet deploy Full K8s with autoscaling, HPA, rolling updates

Adding Your Own Model

  1. Create a folder under models/ with a handler.py and requirements.txt:
class ModelHandler:
    def __init__(self):
        """Load model weights. Runs once at container startup."""
        self.model = load_my_model()

    def predict(self, input_data: dict) -> dict:
        """Run inference and return results."""
        return {"prediction": self.model.predict(input_data)}
  1. Add it to config/routes.yaml and docker-compose.yml, then ./run.sh.

For Kubernetes: modelfleet build my_model -v v1 && modelfleet deploy my_model -v v1


Kubernetes Deployment

For the full K8s experience with autoscaling, rolling updates, and HPA:

pip install -e .

k3d cluster create modelfleet --port "8080:80@loadbalancer" --agents 2
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/gateway-configmap.yaml

modelfleet build sentiment -v v1
modelfleet deploy sentiment -v v1
modelfleet status

CLI Reference

Command Description
modelfleet list List all registered models
modelfleet build <model> -v <ver> Build a Docker image
modelfleet deploy <model> -v <ver> Deploy to K8s cluster
modelfleet status [model] Show deployment status
modelfleet delete <model> -v <ver> Remove a deployment
modelfleet metrics Per-model request metrics

Project Structure

ModelFleet/
├── run.sh                ← One-command launcher
├── docker-compose.yml    ← Full stack definition
├── Dockerfile.model      ← Generic model image builder
│
├── modelfleet/           ← Python package (CLI + core logic)
│   ├── cli.py            ← Click CLI: build, deploy, list, status, delete, metrics
│   ├── builder.py        ← Docker image builder (Jinja2 templates → docker build)
│   ├── deployer.py       ← K8s deployer (renders + applies manifests via kubectl)
│   ├── registry.py       ← Model discovery & handler validation
│   └── config.py         ← Shared constants and paths
│
├── gateway/              ← API Gateway + Dashboard
│   ├── main.py           ← FastAPI: routing, metrics, static file serving
│   ├── metrics.py        ← Per-model request/latency/error tracking
│   ├── static/index.html ← Dashboard UI
│   └── Dockerfile
│
├── server/               ← Generic model server (baked into each model image)
│   └── serve.py          ← FastAPI: /predict, /health, /metrics + Pydantic validation
│
├── models/               ← Model definitions
│   ├── sentiment/        ← DistilBERT sentiment classifier (HuggingFace Transformers)
│   └── price_classifier/ ← Price movement predictor (scikit-learn / rule-based)
│
├── templates/            ← Jinja2 templates for K8s manifests
│   ├── deployment.yaml.j2
│   ├── service.yaml.j2
│   └── hpa.yaml.j2
│
├── k8s/                  ← Static K8s manifests (namespace, gateway, configmap, ingress)
├── config/               ← Docker Compose routing config
└── tests/                ← 19 pytest tests (registry, builder, CLI, handler validation)

Design Decisions

Decision Choice Rationale
Container per model Each model gets its own Docker image Dependency isolation — PyTorch model doesn't need sklearn
FastAPI Over Flask/gRPC Async, auto OpenAPI docs, Pydantic validation, fast
Centralized gateway Single entry point Unified routing, metrics, auth surface
Two deploy modes Docker Compose + K8s Easy demo for reviewers + production-grade for clusters
ConfigMap routing YAML, not a database Simple, K8s-native, zero extra infrastructure
HPA on CPU Not custom metrics Works out of the box; Prometheus is a stretch goal
Rolling updates maxSurge=1, maxUnavailable=0 Zero-downtime version bumps
Jinja2 templates Not Helm (initially) Lower complexity, easier to understand and extend

Running Tests

pip install -e .
pytest tests/ -v

19 tests covering model registry, builder, CLI commands, and handler validation.


License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors