ModelFleet

ML Model Serving Platform on Kubernetes

A platform to register ML models, auto-containerize them, and serve them behind versioned, autoscaled endpoints — with a CLI, API gateway, live dashboard, and per-model metrics.

Tech stack: Python, FastAPI, Docker, Kubernetes, Jinja2, Click, Pydantic

Quick Start (One Command)

Only prerequisite: Docker Desktop

git clone https://github.com/antmlap/modelfleet.git
cd modelfleet
./run.sh

This builds all containers, starts the platform, and opens the dashboard at http://localhost:8080.

  ╔══════════════════════════════════════════════╗
  ║   ModelFleet is running!                     ║
  ╚══════════════════════════════════════════════╝

  Dashboard:  http://localhost:8080
  API Docs:   http://localhost:8080/docs

To stop: docker compose down

What It Does

Model Registry — auto-discovers models from a models/ directory, validates the handler interface
Auto-containerization — generates Dockerfiles from templates, builds tagged images per model
API Gateway — routes POST /models/{name}/{version}/predict to the correct backend container
Live Dashboard — shows deployed models, request metrics, latency, error rates, and lets you test predictions in-browser
Per-model Metrics — tracks request count, avg latency, and error rate per model/version
Kubernetes Deployment — Jinja2-templated Deployments, Services, and HPAs with rolling updates
CLI Tool — modelfleet build, deploy, status, delete, metrics

Example Predictions

# Sentiment analysis (HuggingFace DistilBERT)
curl -X POST http://localhost:8080/models/sentiment/v1/predict \
  -H "Content-Type: application/json" \
  -d '{"input_data": {"text": "I love this product!"}}'

{
  "result": { "label": "positive", "confidence": 0.9998, "text": "I love this product!" },
  "latency_ms": 42.5,
  "model_name": "sentiment",
  "model_version": "v1"
}

# Price movement classifier (scikit-learn)
curl -X POST http://localhost:8080/models/price_classifier/v1/predict \
  -H "Content-Type: application/json" \
  -d '{"input_data": {"prices": [100.0, 101.5, 99.8, 102.3, 105.0]}}'

{
  "result": { "prediction": "UP", "pct_change": 5.0, "prices_received": 5 },
  "latency_ms": 0.06,
  "model_name": "price_classifier",
  "model_version": "v1"
}

Architecture

  Client
    │
    ▼
┌─────────────────────────────────────┐
│       ModelFleet API Gateway        │
│  POST /models/{name}/{ver}/predict  │
│  + Dashboard UI + Metrics           │
└──────────────┬──────────────────────┘
               │  routes by model + version
       ┌───────┼────────┐
       ▼       ▼        ▼
   ┌────────┐ ┌────────┐ ┌────────┐
   │Model A │ │Model B │ │Model C │
   │ Server │ │ Server │ │ Server │
   └────────┘ └────────┘ └────────┘

   Each container runs:
     FastAPI /predict + /health + /metrics
     + model handler code
     + model weights / dependencies

Two deployment modes:

Mode	Command	Use Case
Docker Compose	`./run.sh`	Local demo, portfolio review
Kubernetes	`modelfleet deploy`	Full K8s with autoscaling, HPA, rolling updates

Adding Your Own Model

Create a folder under models/ with a handler.py and requirements.txt:

class ModelHandler:
    def __init__(self):
        """Load model weights. Runs once at container startup."""
        self.model = load_my_model()

    def predict(self, input_data: dict) -> dict:
        """Run inference and return results."""
        return {"prediction": self.model.predict(input_data)}

Add it to config/routes.yaml and docker-compose.yml, then ./run.sh.

For Kubernetes: modelfleet build my_model -v v1 && modelfleet deploy my_model -v v1

Kubernetes Deployment

For the full K8s experience with autoscaling, rolling updates, and HPA:

pip install -e .

k3d cluster create modelfleet --port "8080:80@loadbalancer" --agents 2
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/gateway-configmap.yaml

modelfleet build sentiment -v v1
modelfleet deploy sentiment -v v1
modelfleet status

CLI Reference

Command	Description
`modelfleet list`	List all registered models
`modelfleet build <model> -v <ver>`	Build a Docker image
`modelfleet deploy <model> -v <ver>`	Deploy to K8s cluster
`modelfleet status [model]`	Show deployment status
`modelfleet delete <model> -v <ver>`	Remove a deployment
`modelfleet metrics`	Per-model request metrics

Project Structure

ModelFleet/
├── run.sh                ← One-command launcher
├── docker-compose.yml    ← Full stack definition
├── Dockerfile.model      ← Generic model image builder
│
├── modelfleet/           ← Python package (CLI + core logic)
│   ├── cli.py            ← Click CLI: build, deploy, list, status, delete, metrics
│   ├── builder.py        ← Docker image builder (Jinja2 templates → docker build)
│   ├── deployer.py       ← K8s deployer (renders + applies manifests via kubectl)
│   ├── registry.py       ← Model discovery & handler validation
│   └── config.py         ← Shared constants and paths
│
├── gateway/              ← API Gateway + Dashboard
│   ├── main.py           ← FastAPI: routing, metrics, static file serving
│   ├── metrics.py        ← Per-model request/latency/error tracking
│   ├── static/index.html ← Dashboard UI
│   └── Dockerfile
│
├── server/               ← Generic model server (baked into each model image)
│   └── serve.py          ← FastAPI: /predict, /health, /metrics + Pydantic validation
│
├── models/               ← Model definitions
│   ├── sentiment/        ← DistilBERT sentiment classifier (HuggingFace Transformers)
│   └── price_classifier/ ← Price movement predictor (scikit-learn / rule-based)
│
├── templates/            ← Jinja2 templates for K8s manifests
│   ├── deployment.yaml.j2
│   ├── service.yaml.j2
│   └── hpa.yaml.j2
│
├── k8s/                  ← Static K8s manifests (namespace, gateway, configmap, ingress)
├── config/               ← Docker Compose routing config
└── tests/                ← 19 pytest tests (registry, builder, CLI, handler validation)

Design Decisions

Decision	Choice	Rationale
Container per model	Each model gets its own Docker image	Dependency isolation — PyTorch model doesn't need sklearn
FastAPI	Over Flask/gRPC	Async, auto OpenAPI docs, Pydantic validation, fast
Centralized gateway	Single entry point	Unified routing, metrics, auth surface
Two deploy modes	Docker Compose + K8s	Easy demo for reviewers + production-grade for clusters
ConfigMap routing	YAML, not a database	Simple, K8s-native, zero extra infrastructure
HPA on CPU	Not custom metrics	Works out of the box; Prometheus is a stretch goal
Rolling updates	maxSurge=1, maxUnavailable=0	Zero-downtime version bumps
Jinja2 templates	Not Helm (initially)	Lower complexity, easier to understand and extend

Running Tests

pip install -e .
pytest tests/ -v

19 tests covering model registry, builder, CLI commands, and handler validation.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelFleet

Quick Start (One Command)

What It Does

Example Predictions

Architecture

Adding Your Own Model

Kubernetes Deployment

CLI Reference

Project Structure

Design Decisions

Running Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
gateway		gateway
k8s		k8s
modelfleet		modelfleet
models		models
server		server
templates		templates
tests		tests
.gitignore		.gitignore
Dockerfile.model		Dockerfile.model
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

ModelFleet

Quick Start (One Command)

What It Does

Example Predictions

Architecture

Adding Your Own Model

Kubernetes Deployment

CLI Reference

Project Structure

Design Decisions

Running Tests

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages