End-to-end space weather ML pipeline for data ingestion, preprocessing, label generation, model training, and probability calibration.
data_sources/: data download and collection scriptsdatabase_builder/: raw data warehouse and table constructionpreprocessing_pipeline/: feature engineering, aggregation, splits, normalization, labels, and final mergemodeling_pipeline/: training and evaluation scripts (multi-horizon)modeling_pipeline_daily/: legacy daily modeling utilities and plotsprobability_calibration/: calibration DB builder, regime-aware isotonic calibration, and plotstests/: test suite
- Build or refresh raw data sources.
- Run preprocessing pipelines per data source.
- Merge final datasets into a unified SQLite DB.
- Train models for horizons 1–8.
- Build calibration DB and fit regime-aware calibrators.
- Plot diagnostics as needed.
preprocessing_pipeline/check_multicolinearity/all_preprocessed_sources.db: merged feature/label datasetmodeling_pipeline/output_h{X}/: per-horizon models and diagnosticsprobability_calibration/validation_calibration.db: calibration datasetprobability_calibration/calibration_h{X}/: per-horizon isotonic calibrators + metadata
Most scripts are executable as standalone Python files. Example:
/bin/python3 preprocessing_pipeline/check_multicolinearity/merge_features.py
/bin/python3 modeling_pipeline/train_model.py
/bin/python3 probability_calibration/build_calibration_db.py
/bin/python3 probability_calibration/regime_aware_calibration.py
- Databases are SQLite and live under their respective pipeline directories.
- Many stages rely on environment variables for split windows and aggregation cadence.
- Horizon selection for training and calibration is handled by constants in the scripts.