This project develops a machine learning framework to predict freshwater stress globally using environmental and socioeconomic indicators across 25 years (2001–2025).
The study integrates feature engineering, dynamic lag modeling, explainable AI (SHAP), scenario analysis, uncertainty quantification, and geospatial visualization to identify the drivers of freshwater depletion and rank countries by risk.
- Can AI accurately predict freshwater stress using multi-source sustainability indicators?
- Which factors (urbanization, agriculture, industry, rainfall) contribute most to freshwater stress?
- Can explainable AI provide interpretable insights for decision-makers?
- How might future socioeconomic and industrial changes impact water stress?
- Higher urbanization and industrial activity → higher freshwater stress
- Agricultural intensity strongly contributes to water stress
- Low rainfall / anomalies → increased stress
- Dynamic lag features improve predictive performance
- Water demand indicators are stronger predictors than climate alone
- Global panel dataset of ~200 countries (2001–2025)
- 10 key indicators:
- Water stress
- Rainfall / precipitation
- Urban population & density
- Industrial & agricultural activity
- Water demand ratios and interactions
Source: World Bank Open Data and other public sustainability datasets.
- Data Preprocessing: handle missing values, reshape into panel data
- Feature Engineering: create dynamic lag features, interaction variables
- Machine Learning Models: Random Forest, XGBoost
- Explainable AI: SHAP values and feature importance
- Scenario Analysis: simulate urbanization and industrial growth
- Uncertainty Estimation: bootstrap confidence intervals
- Geospatial Visualization: global map and top 20 risk countries
| Model | MAE | R² | Accuracy |
|---|---|---|---|
| Random Forest (Dynamic Lag) | 2.57 | 0.992 | 99.4% |
| Before Lag Features | 52.2 | 0.28 | 78.4% |
Interpretation: Incorporating dynamic lag features drastically improved prediction performance.
| Rank | Country | Predicted Stress |
|---|---|---|
| 1 | Kuwait | 3850 |
| 2 | UAE | 1509 |
| 3 | Saudi Arabia | 974 |
| 4 | Libya | 817 |
| 5 | Qatar | 431 |
| Feature | Mean SHAP |
|---|---|
| Previous Year Water Stress | 61.86 |
| Urban Pressure | 2.01 |
| Urban Pressure Lag1 | 1.57 |
| Agricultural Intensity Lag1 | 0.32 |
| Industrial-Urban Interaction | 0.27 |
- Dark red indicates high stress
- Grey indicates missing/no data
- Systems-level predictive modeling of freshwater stress
- 25-year global panel analysis
- Explainable AI (SHAP) to interpret drivers
- Feature engineering for sustainability indicators
- Policy-relevant insights for water governance
- Country-level aggregation may hide sub-national water stress
- Limited hydrological variables (groundwater, surface water)
- Predictions assume trends continue as in historical data
- Scenario-based forecasting for 2030+
- Integration of groundwater and surface water datasets
- Alternative ML models: LightGBM, temporal deep learning
- Advanced uncertainty and sensitivity analysis
- Python 3.12
- Pandas, NumPy
- Scikit-learn, XGBoost
- SHAP
- GeoPandas, Matplotlib, Seaborn
All Rights Reserved.
Usage requires explicit permission from the author.
Contact: [meenuhani27@gmail.com]

