This project demonstrates a complete, end-to-end framework for real-time patient monitoring using a distributed AI architecture. The system ingests simulated IoT data at the edge using EdgeX Foundry, processes it with a Deep Reinforcement Learning (DRL) model running in a Kubernetes cluster, and simulates a Federated Learning (FL) loop for collaborative, privacy-preserving model improvement.
A key feature is the implementation of dynamic, performance-based autoscaling, allowing the system to scale based on either CPU load or application latency.
The architecture is divided into three logical layers: the Edge, the Edge Cluster, and the Central Cloud.
-
Edge (Data Ingestion): An EdgeX Foundry stack running in Docker simulates IoT devices, ingests data, and forwards it for processing.
-
Edge Cluster (Real-Time Processing): A Kubernetes cluster, also running at the edge, hosts the DRL application, the monitoring stack, and the autoscaling components.
-
Central Cloud (Collaborative Learning): A central server manages the Federated Learning process by aggregating model updates and distributing improved global models.
| Category | Technology | Purpose |
|---|---|---|
| IoT & Edge | EdgeX Foundry | Open-source IoT middleware for data ingestion and processing at the edge. |
| Containerization | Docker, Docker Compose | To package and run all EdgeX and FL Server services consistently. |
| Orchestration | Kubernetes | To manage, deploy, and automatically scale the DRL application at the edge. |
| Monitoring | Prometheus, Grafana | For collecting time-series metrics (latency, CPU) and visualizing them in real-time dashboards. |
| AI/ML | PyTorch, TensorFlow Lite | To build, train, and run the DRL model (edge) and the Global model (cloud). |
| Application | Python, Flask | The framework for the DRL edge agent and the central FL server. |
- Docker Desktop with Kubernetes enabled
kubectlcommand-line toolhelmfor managing Kubernetes packages
- Register EdgeX Components:
# 1. Register Device Profile
curl http://localhost:59881/api/v2/deviceprofile -H "Content-Type: application/json" -d '@path/to/patient-monitor-profile.json'
# 2. Register Device Service
curl http://localhost:59881/api/v2/deviceservice -H "Content-Type: application/json" -d '@path/to/patient-monitor-service.json'
# 3. Register Device
curl http://localhost:59881/api/v2/device -H "Content-Type: application/json" -d '@path/to/patient-monitor-device.json'- Start EdgeX Services:
docker-compose -f docker-compose-no-secty.yml up -d --build-
Build and Push Your Application Image:
- Navigate to the
latency-appdirectory - Build the image:
docker build -t your-dockerhub-username/latency-app:v1 . - Push the image:
docker push your-dockerhub-username/latency-app:v1
- Navigate to the
-
Deploy to Kubernetes:
- Navigate to the
k8s-configdirectory - Update the
image:indeployment.yamlto point to the image you just pushed
- Navigate to the
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml- Configure HPA for CPU:
# hpa.yaml
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80- Apply the HPA:
kubectl apply -f hpa.yaml- Install Prometheus Adapter:
helm upgrade --install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
-f custom-metrics-values.yaml \
--set prometheus.url=http://monitoring-kube-prometheus-prometheus.monitoring.svc- Configure HPA for Latency:
# hpa.yaml
metrics:
- type: Pods
pods:
metric:
name: flask_http_request_duration_seconds_avg
target:
type: AverageValue
averageValue: "500"- Apply the HPA:
kubectl apply -f hpa.yaml-
Start the Data Flow: Ensure your EdgeX stack is running with
docker-compose up -d -
Monitor HPA Status:
kubectl get hpa -w local-target-app-hpa- Watch Pods Scale:
kubectl get pods -w- View Grafana Dashboards:
# Forward the port
kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring
# Open http://localhost:3000 in your browserAs the load from EdgeX increases the CPU or latency of your application, you will see the HPA detect the change and increase the number of REPLICAS, and new pods will be created in real-time.
This means the HPA cannot get the metric.
- For CPU: Ensure the Kubernetes Metrics Server is running (
kubectl get pods -n kube-system) - For Latency: This is a complex issue. Check the Prometheus Adapter logs, ensure its rules in
custom-metrics-values.yamlare correct (especially the job label), and verify the HPA has the right RBAC permissions.
This means Prometheus is not scraping your application.
- Ensure your
deployment.yamlhas the correctprometheus.io/scrape: "true"annotations - If using the Prometheus Operator, ensure you have a ServiceMonitor manifest that correctly targets your application's service
- Check for port conflicts on your host machine
- Ensure all necessary environment variables are set in
docker-compose-no-secty.yml - Use
docker-compose logsto debug the specific service that is failing
Please feel free to submit issues and pull requests to improve this project.
This project is provided as-is for educational and research purposes.