Deployment Guide

Complete guide for deploying the FastAPI ML Model API to production.

🚀 Deployment Options

Docker / Docker Compose
AWS (EC2, ECS, Lambda)
Google Cloud Platform
Azure
Kubernetes
Traditional VPS

🐳 Docker Deployment

Local Docker

# Build
docker build -t ml-model-api:latest .

# Run
docker run -d \
  -p 8000:8000 \
  --name ml-api \
  --env-file .env \
  -v $(pwd)/model_storage:/app/model_storage \
  ml-model-api:latest

# Check logs
docker logs -f ml-api

Docker Compose (Production)

Edit docker-compose.yml for production:

services:
  ml-api:
    build: .
    restart: always
    environment:
      - DEBUG=false
      - LOG_LEVEL=WARNING
      - WORKERS=4
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G

docker-compose up -d --scale ml-api=3

☁️ AWS Deployment

Option 1: AWS EC2

# 1. Launch EC2 instance (t3.medium or larger)

# 2. SSH into instance
ssh -i your-key.pem ubuntu@your-ec2-ip

# 3. Install Docker
sudo apt update
sudo apt install -y docker.io docker-compose
sudo usermod -aG docker ubuntu

# 4. Clone your repo
git clone your-repo
cd your-repo

# 5. Configure environment
cp env.example .env
nano .env  # Edit settings

# 6. Run
docker-compose up -d

# 7. Configure security group to allow port 8000

Option 2: AWS ECS (Elastic Container Service)

# 1. Push to ECR
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin \
  your-account.dkr.ecr.us-east-1.amazonaws.com

docker tag ml-model-api:latest \
  your-account.dkr.ecr.us-east-1.amazonaws.com/ml-model-api:latest

docker push your-account.dkr.ecr.us-east-1.amazonaws.com/ml-model-api:latest

# 2. Create ECS task definition (JSON)
{
  "family": "ml-model-api",
  "containerDefinitions": [{
    "name": "ml-api",
    "image": "your-account.dkr.ecr.us-east-1.amazonaws.com/ml-model-api:latest",
    "portMappings": [{
      "containerPort": 8000,
      "protocol": "tcp"
    }],
    "environment": [
      {"name": "DEBUG", "value": "false"},
      {"name": "WORKERS", "value": "4"}
    ],
    "memory": 2048,
    "cpu": 1024
  }]
}

# 3. Create ECS service
aws ecs create-service \
  --cluster your-cluster \
  --service-name ml-model-api \
  --task-definition ml-model-api \
  --desired-count 2 \
  --launch-type FARGATE

Option 3: AWS Lambda (Serverless)

For Lambda deployment, use Mangum:

# app/lambda_handler.py
from mangum import Mangum
from app.main import app

handler = Mangum(app)

Update requirements.txt:

mangum==0.17.0

Deploy with AWS SAM or Serverless Framework.

🌐 GCP Deployment

Option 1: Google Cloud Run

# 1. Build and push to GCR
gcloud builds submit --tag gcr.io/your-project/ml-model-api

# 2. Deploy to Cloud Run
gcloud run deploy ml-model-api \
  --image gcr.io/your-project/ml-model-api \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --memory 2Gi \
  --cpu 2 \
  --set-env-vars DEBUG=false,LOG_LEVEL=INFO

# 3. Get URL
gcloud run services describe ml-model-api --region us-central1

Option 2: GCE (Compute Engine)

Similar to AWS EC2 - launch VM, install Docker, run containers.

Option 3: GKE (Kubernetes)

See Kubernetes section.

🔷 Azure Deployment

Azure Container Instances

# 1. Login
az login

# 2. Create resource group
az group create --name ml-api-rg --location eastus

# 3. Create container registry
az acr create --resource-group ml-api-rg \
  --name mlapiregistry --sku Basic

# 4. Build and push
az acr build --registry mlapiregistry \
  --image ml-model-api:latest .

# 5. Deploy to ACI
az container create \
  --resource-group ml-api-rg \
  --name ml-model-api \
  --image mlapiregistry.azurecr.io/ml-model-api:latest \
  --cpu 2 --memory 4 \
  --registry-login-server mlapiregistry.azurecr.io \
  --registry-username $(az acr credential show \
    --name mlapiregistry --query username -o tsv) \
  --registry-password $(az acr credential show \
    --name mlapiregistry --query passwords[0].value -o tsv) \
  --dns-name-label ml-api-unique \
  --ports 8000

☸️ Kubernetes Deployment

1. Create Kubernetes Manifests

k8s/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-api
  labels:
    app: ml-model-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ml-model-api
  template:
    metadata:
      labels:
        app: ml-model-api
    spec:
      containers:
      - name: api
        image: your-registry/ml-model-api:latest
        ports:
        - containerPort: 8000
        env:
        - name: DEBUG
          value: "false"
        - name: WORKERS
          value: "1"
        - name: LOG_LEVEL
          value: "INFO"
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /api/v1/health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /api/v1/health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
        volumeMounts:
        - name: model-storage
          mountPath: /app/model_storage
      volumes:
      - name: model-storage
        persistentVolumeClaim:
          claimName: model-storage-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: ml-model-api-service
spec:
  selector:
    app: ml-model-api
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: model-storage-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

k8s/ingress.yaml:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ml-model-api-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - api.yourdomain.com
    secretName: ml-api-tls
  rules:
  - host: api.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: ml-model-api-service
            port:
              number: 80

2. Deploy

# Apply manifests
kubectl apply -f k8s/

# Check status
kubectl get pods
kubectl get services

# View logs
kubectl logs -f deployment/ml-model-api

# Scale
kubectl scale deployment ml-model-api --replicas=5

🖥️ VPS Deployment

Using Nginx + Gunicorn

# 1. Install dependencies
sudo apt update
sudo apt install -y python3-pip python3-venv nginx

# 2. Setup application
cd /opt
sudo git clone your-repo ml-api
cd ml-api
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt gunicorn

# 3. Create systemd service
sudo nano /etc/systemd/system/ml-api.service

/etc/systemd/system/ml-api.service:

[Unit]
Description=ML Model API
After=network.target

[Service]
User=www-data
Group=www-data
WorkingDirectory=/opt/ml-api
Environment="PATH=/opt/ml-api/venv/bin"
ExecStart=/opt/ml-api/venv/bin/gunicorn app.main:app \
  -w 4 \
  -k uvicorn.workers.UvicornWorker \
  --bind 127.0.0.1:8000 \
  --timeout 120 \
  --access-logfile /var/log/ml-api-access.log \
  --error-logfile /var/log/ml-api-error.log

[Install]
WantedBy=multi-user.target

# 4. Configure Nginx
sudo nano /etc/nginx/sites-available/ml-api

/etc/nginx/sites-available/ml-api:

server {
    listen 80;
    server_name api.yourdomain.com;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # Timeouts for ML predictions
        proxy_connect_timeout 300s;
        proxy_send_timeout 300s;
        proxy_read_timeout 300s;
    }
}

# 5. Enable and start
sudo ln -s /etc/nginx/sites-available/ml-api /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx

sudo systemctl enable ml-api
sudo systemctl start ml-api

# 6. Check status
sudo systemctl status ml-api

SSL with Let's Encrypt

sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d api.yourdomain.com

🔒 Security Best Practices

1. Environment Variables

Never commit .env files. Use secrets management:

AWS: AWS Secrets Manager
GCP: Secret Manager
Azure: Key Vault
Kubernetes: Secrets

2. API Authentication

Add authentication to app/main.py:

from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials

security = HTTPBearer()

@app.middleware("http")
async def verify_api_key(request: Request, call_next):
    if request.url.path not in ["/", "/health", "/docs"]:
        auth = request.headers.get("Authorization")
        if not auth or auth != f"Bearer {API_KEY}":
            return JSONResponse({"error": "Unauthorized"}, 401)
    return await call_next(request)

3. Rate Limiting

pip install slowapi

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter

@app.get("/api/v1/predict")
@limiter.limit("10/minute")
async def predict(request: Request):
    ...

4. HTTPS Only

Always use HTTPS in production. Configure in reverse proxy or load balancer.

📊 Monitoring

Health Checks

Built-in: GET /api/v1/health

Prometheus Metrics

pip install prometheus-fastapi-instrumentator

from prometheus_fastapi_instrumentator import Instrumentator

Instrumentator().instrument(app).expose(app)

Logging

JSON logs are already configured. Ship to:

CloudWatch (AWS)
Stackdriver (GCP)
ELK Stack
Datadog

🔄 CI/CD

GitHub Actions

.github/workflows/deploy.yml:

name: Deploy

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    
    - name: Build Docker image
      run: docker build -t ml-model-api .
    
    - name: Run tests
      run: |
        pip install pytest
        pytest
    
    - name: Push to registry
      run: |
        echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
        docker tag ml-model-api your-registry/ml-model-api:latest
        docker push your-registry/ml-model-api:latest
    
    - name: Deploy to production
      run: |
        # Your deployment commands

🎯 Performance Tuning

Workers

# Calculate: (2 x CPU cores) + 1
WORKERS=5

Async Loading

Models load asynchronously on startup for faster boot times.

Caching

Add Redis for prediction caching:

import redis
cache = redis.Redis(host='redis', port=6379)

@app.post("/predict/{model_name}")
async def predict(model_name: str, data: dict):
    cache_key = f"{model_name}:{hash(str(data))}"
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)
    
    result = await model_service.predict(model_name, data)
    cache.setex(cache_key, 3600, json.dumps(result))
    return result

📝 Checklist

Before deploying to production:

🆘 Troubleshooting

High Memory Usage

Reduce number of workers
Unload unused models
Use model quantization

Slow Predictions

Use async models
Add GPU support
Implement batch predictions
Add caching

Container Crashes

Check logs: docker logs ml-api
Increase memory limits
Check model file paths

📚 Additional Resources

Ready to Deploy! 🚀

FilesExpand file tree

DEPLOYMENT.md

Latest commit

History