This repository was archived by the owner on Sep 22, 2025. It is now read-only.
Guardian Alerts: LSTM anomaly + RF/MLP classifier, vitals overlay, alert logic & runbook #352
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Summary
This update covers major developments completed across both Sprint 1 and Sprint 2 for the Guardian Alerts Monitoring project.
Sprint 1: Data Generation and Synthetic Dataset Creation
Developed robust Python scripts and Jupyter notebooks for automated synthetic patient data generation.
Produced a standardized, clinically realistic dataset comprising 500 patient records, including consistent demographic information, observation windows, and health vitals.
Developed a DatasetValidator script: Added to automatically check the integrity and consistency of all generated datasets before they’re used in the pipeline. This script ensures column types, expected value ranges, patient ID consistency, and catches common data errors early.
All generated datasets align with the schema and requirements of the GMAlerts system, supporting both initial model prototyping and pipeline validation.
Scripts ensure reproducibility and scalability, facilitating rapid expansion or modification of the dataset for future research and development needs.
Sprint 2: Alerts Monitoring Pipeline Implementation
Implemented the end-to-end alerts monitoring pipeline featuring:
LSTM (autoencoder) anomaly detection, with IsolationForest as a fallback method.
A behavioral classifier using Random Forest and MLP architectures.
Integration of clinically anchored vital sign overlays, including SpO₂, temperature (°C), blood pressure, daily exercise, and meals skipped.
Generation of a single risk_level (Low/Medium/High) per record, accompanied by a human-readable reason field.
Comprehensive visualizations and detailed runbook documentation to support both technical and clinical review.
Outputs include:
Processed datasets with risk scores and reasons for alerts.
Trained model artifacts and supporting metadata for reproducible results.
What’s Included
AlertSystemTask/AlertSystemScript.ipynb: End-to-end notebook with auto-discovery of the latest dataset (New AI spreadsheet - Sheet1.csv), feature engineering, model training, prediction, and result visualization.
Synthetic data generation scripts and sample datasets (GMAlertsDataset.csv, Synthetic_Output.csv, etc.) from Sprint 1, supporting the pipeline and downstream tasks.
Documentation:
AlertSystemTask/RUNBOOK_GM_Alerts.md: How to execute the pipeline.
AlertSystemTask/README.md: System architecture, design choices, and pipeline overview.
Output artifacts:
alerts.csv: User- and time-stamped risk level predictions and explanations.
Trained model files (lstm.pt, iforest.pkl, clf.pkl, scaler.pkl, thresholds.json).
How to Run
Install notebook dependencies
pip install pandas numpy scikit-learn joblib matplotlib torch # torch optional
Launch the notebook interface
jupyter notebook AlertSystemTask/AlertSystemScript.ipynb
Execute all cells; outputs are saved to AlertSystemTask/artifacts/