Complete guide for deploying the FastAPI ML Model API to production.
- Docker / Docker Compose
- AWS (EC2, ECS, Lambda)
- Google Cloud Platform
- Azure
- Kubernetes
- Traditional VPS
# Build
docker build -t ml-model-api:latest .
# Run
docker run -d \
-p 8000:8000 \
--name ml-api \
--env-file .env \
-v $(pwd)/model_storage:/app/model_storage \
ml-model-api:latest
# Check logs
docker logs -f ml-apiEdit docker-compose.yml for production:
services:
ml-api:
build: .
restart: always
environment:
- DEBUG=false
- LOG_LEVEL=WARNING
- WORKERS=4
deploy:
resources:
limits:
cpus: '2'
memory: 4Gdocker-compose up -d --scale ml-api=3# 1. Launch EC2 instance (t3.medium or larger)
# 2. SSH into instance
ssh -i your-key.pem ubuntu@your-ec2-ip
# 3. Install Docker
sudo apt update
sudo apt install -y docker.io docker-compose
sudo usermod -aG docker ubuntu
# 4. Clone your repo
git clone your-repo
cd your-repo
# 5. Configure environment
cp env.example .env
nano .env # Edit settings
# 6. Run
docker-compose up -d
# 7. Configure security group to allow port 8000# 1. Push to ECR
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin \
your-account.dkr.ecr.us-east-1.amazonaws.com
docker tag ml-model-api:latest \
your-account.dkr.ecr.us-east-1.amazonaws.com/ml-model-api:latest
docker push your-account.dkr.ecr.us-east-1.amazonaws.com/ml-model-api:latest
# 2. Create ECS task definition (JSON)
{
"family": "ml-model-api",
"containerDefinitions": [{
"name": "ml-api",
"image": "your-account.dkr.ecr.us-east-1.amazonaws.com/ml-model-api:latest",
"portMappings": [{
"containerPort": 8000,
"protocol": "tcp"
}],
"environment": [
{"name": "DEBUG", "value": "false"},
{"name": "WORKERS", "value": "4"}
],
"memory": 2048,
"cpu": 1024
}]
}
# 3. Create ECS service
aws ecs create-service \
--cluster your-cluster \
--service-name ml-model-api \
--task-definition ml-model-api \
--desired-count 2 \
--launch-type FARGATEFor Lambda deployment, use Mangum:
# app/lambda_handler.py
from mangum import Mangum
from app.main import app
handler = Mangum(app)Update requirements.txt:
mangum==0.17.0Deploy with AWS SAM or Serverless Framework.
# 1. Build and push to GCR
gcloud builds submit --tag gcr.io/your-project/ml-model-api
# 2. Deploy to Cloud Run
gcloud run deploy ml-model-api \
--image gcr.io/your-project/ml-model-api \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--memory 2Gi \
--cpu 2 \
--set-env-vars DEBUG=false,LOG_LEVEL=INFO
# 3. Get URL
gcloud run services describe ml-model-api --region us-central1Similar to AWS EC2 - launch VM, install Docker, run containers.
See Kubernetes section.
# 1. Login
az login
# 2. Create resource group
az group create --name ml-api-rg --location eastus
# 3. Create container registry
az acr create --resource-group ml-api-rg \
--name mlapiregistry --sku Basic
# 4. Build and push
az acr build --registry mlapiregistry \
--image ml-model-api:latest .
# 5. Deploy to ACI
az container create \
--resource-group ml-api-rg \
--name ml-model-api \
--image mlapiregistry.azurecr.io/ml-model-api:latest \
--cpu 2 --memory 4 \
--registry-login-server mlapiregistry.azurecr.io \
--registry-username $(az acr credential show \
--name mlapiregistry --query username -o tsv) \
--registry-password $(az acr credential show \
--name mlapiregistry --query passwords[0].value -o tsv) \
--dns-name-label ml-api-unique \
--ports 8000k8s/deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model-api
labels:
app: ml-model-api
spec:
replicas: 3
selector:
matchLabels:
app: ml-model-api
template:
metadata:
labels:
app: ml-model-api
spec:
containers:
- name: api
image: your-registry/ml-model-api:latest
ports:
- containerPort: 8000
env:
- name: DEBUG
value: "false"
- name: WORKERS
value: "1"
- name: LOG_LEVEL
value: "INFO"
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /api/v1/health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/v1/health
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
volumeMounts:
- name: model-storage
mountPath: /app/model_storage
volumes:
- name: model-storage
persistentVolumeClaim:
claimName: model-storage-pvc
---
apiVersion: v1
kind: Service
metadata:
name: ml-model-api-service
spec:
selector:
app: ml-model-api
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: LoadBalancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-storage-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gik8s/ingress.yaml:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ml-model-api-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- api.yourdomain.com
secretName: ml-api-tls
rules:
- host: api.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ml-model-api-service
port:
number: 80# Apply manifests
kubectl apply -f k8s/
# Check status
kubectl get pods
kubectl get services
# View logs
kubectl logs -f deployment/ml-model-api
# Scale
kubectl scale deployment ml-model-api --replicas=5# 1. Install dependencies
sudo apt update
sudo apt install -y python3-pip python3-venv nginx
# 2. Setup application
cd /opt
sudo git clone your-repo ml-api
cd ml-api
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt gunicorn
# 3. Create systemd service
sudo nano /etc/systemd/system/ml-api.service/etc/systemd/system/ml-api.service:
[Unit]
Description=ML Model API
After=network.target
[Service]
User=www-data
Group=www-data
WorkingDirectory=/opt/ml-api
Environment="PATH=/opt/ml-api/venv/bin"
ExecStart=/opt/ml-api/venv/bin/gunicorn app.main:app \
-w 4 \
-k uvicorn.workers.UvicornWorker \
--bind 127.0.0.1:8000 \
--timeout 120 \
--access-logfile /var/log/ml-api-access.log \
--error-logfile /var/log/ml-api-error.log
[Install]
WantedBy=multi-user.target# 4. Configure Nginx
sudo nano /etc/nginx/sites-available/ml-api/etc/nginx/sites-available/ml-api:
server {
listen 80;
server_name api.yourdomain.com;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts for ML predictions
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
}
}# 5. Enable and start
sudo ln -s /etc/nginx/sites-available/ml-api /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx
sudo systemctl enable ml-api
sudo systemctl start ml-api
# 6. Check status
sudo systemctl status ml-apisudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d api.yourdomain.comNever commit .env files. Use secrets management:
- AWS: AWS Secrets Manager
- GCP: Secret Manager
- Azure: Key Vault
- Kubernetes: Secrets
Add authentication to app/main.py:
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
security = HTTPBearer()
@app.middleware("http")
async def verify_api_key(request: Request, call_next):
if request.url.path not in ["/", "/health", "/docs"]:
auth = request.headers.get("Authorization")
if not auth or auth != f"Bearer {API_KEY}":
return JSONResponse({"error": "Unauthorized"}, 401)
return await call_next(request)pip install slowapifrom slowapi import Limiter
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
@app.get("/api/v1/predict")
@limiter.limit("10/minute")
async def predict(request: Request):
...Always use HTTPS in production. Configure in reverse proxy or load balancer.
Built-in: GET /api/v1/health
pip install prometheus-fastapi-instrumentatorfrom prometheus_fastapi_instrumentator import Instrumentator
Instrumentator().instrument(app).expose(app)JSON logs are already configured. Ship to:
- CloudWatch (AWS)
- Stackdriver (GCP)
- ELK Stack
- Datadog
.github/workflows/deploy.yml:
name: Deploy
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Build Docker image
run: docker build -t ml-model-api .
- name: Run tests
run: |
pip install pytest
pytest
- name: Push to registry
run: |
echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
docker tag ml-model-api your-registry/ml-model-api:latest
docker push your-registry/ml-model-api:latest
- name: Deploy to production
run: |
# Your deployment commands# Calculate: (2 x CPU cores) + 1
WORKERS=5Models load asynchronously on startup for faster boot times.
Add Redis for prediction caching:
import redis
cache = redis.Redis(host='redis', port=6379)
@app.post("/predict/{model_name}")
async def predict(model_name: str, data: dict):
cache_key = f"{model_name}:{hash(str(data))}"
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
result = await model_service.predict(model_name, data)
cache.setex(cache_key, 3600, json.dumps(result))
return resultBefore deploying to production:
- Set
DEBUG=false - Configure proper
LOG_LEVEL - Set appropriate
WORKERS - Enable HTTPS
- Add authentication/API keys
- Set up monitoring
- Configure rate limiting
- Set up backups for model storage
- Test health checks
- Configure auto-scaling
- Set up CI/CD pipeline
- Document API endpoints
- Load test the API
- Reduce number of workers
- Unload unused models
- Use model quantization
- Use async models
- Add GPU support
- Implement batch predictions
- Add caching
- Check logs:
docker logs ml-api - Increase memory limits
- Check model file paths
Ready to Deploy! π