Skip to content
This repository was archived by the owner on Sep 22, 2025. It is now read-only.

Conversation

@HarshaDamarla
Copy link
Collaborator

@HarshaDamarla HarshaDamarla commented Sep 17, 2025

Summary

Summary

This update covers major developments completed across both Sprint 1 and Sprint 2 for the Guardian Alerts Monitoring project.

Sprint 1: Data Generation and Synthetic Dataset Creation

Developed robust Python scripts and Jupyter notebooks for automated synthetic patient data generation.

Produced a standardized, clinically realistic dataset comprising 500 patient records, including consistent demographic information, observation windows, and health vitals.

Developed a DatasetValidator script: Added to automatically check the integrity and consistency of all generated datasets before they’re used in the pipeline. This script ensures column types, expected value ranges, patient ID consistency, and catches common data errors early.

All generated datasets align with the schema and requirements of the GMAlerts system, supporting both initial model prototyping and pipeline validation.

Scripts ensure reproducibility and scalability, facilitating rapid expansion or modification of the dataset for future research and development needs.

Sprint 2: Alerts Monitoring Pipeline Implementation

Implemented the end-to-end alerts monitoring pipeline featuring:

LSTM (autoencoder) anomaly detection, with IsolationForest as a fallback method.

A behavioral classifier using Random Forest and MLP architectures.

Integration of clinically anchored vital sign overlays, including SpO₂, temperature (°C), blood pressure, daily exercise, and meals skipped.

Generation of a single risk_level (Low/Medium/High) per record, accompanied by a human-readable reason field.

Comprehensive visualizations and detailed runbook documentation to support both technical and clinical review.

Outputs include:

Processed datasets with risk scores and reasons for alerts.

Trained model artifacts and supporting metadata for reproducible results.

What’s Included

AlertSystemTask/AlertSystemScript.ipynb: End-to-end notebook with auto-discovery of the latest dataset (New AI spreadsheet - Sheet1.csv), feature engineering, model training, prediction, and result visualization.

Synthetic data generation scripts and sample datasets (GMAlertsDataset.csv, Synthetic_Output.csv, etc.) from Sprint 1, supporting the pipeline and downstream tasks.

Documentation:

AlertSystemTask/RUNBOOK_GM_Alerts.md: How to execute the pipeline.

AlertSystemTask/README.md: System architecture, design choices, and pipeline overview.

Output artifacts:

alerts.csv: User- and time-stamped risk level predictions and explanations.

Trained model files (lstm.pt, iforest.pkl, clf.pkl, scaler.pkl, thresholds.json).

How to Run

Install notebook dependencies

pip install pandas numpy scikit-learn joblib matplotlib torch # torch optional

Launch the notebook interface

jupyter notebook AlertSystemTask/AlertSystemScript.ipynb

Execute all cells; outputs are saved to AlertSystemTask/artifacts/

@BhuvanPS BhuvanPS self-requested a review September 18, 2025 08:18
Copy link
Collaborator

@BhuvanPS BhuvanPS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants