Machine learning analysis to identify the top 5 socioeconomic determinants of suicide rates using World Bank (WDI) and WHO data. Systematic reduction from 1509 β 5 interpretable variables focused on mental health public policy.
π Read the full paper | π View results
Decision Tree (best model): RΒ² = 0.82, MSE = 8.56
Lasso Regression: RΒ² = 0.24, MSE = 36.82
| Variable | Importance | Effect | Interpretation |
|---|---|---|---|
| πΊ Female labor force participation | 0.296 | β¬οΈ | "Double burden" stress |
| ποΈ Population density | 0.284 | β¬οΈ | Access to services/social networks |
| π Industrial employment | 0.169 | β¬οΈ | Adverse working conditions |
| β‘ Access to electricity | 0.137 | β¬οΈ | Development indicator |
| π₯ Private health spending | 0.115 | β¬οΈ | Access to mental health |
Analysis of 185 countries (2000-2021) integrating World Bank and WHO data to identify socioeconomic factors that most influence suicide rates. Complete pipeline: data collection β variable selection β modeling β interpretation.
Why this matters? Suicide affects 720+ thousand people/year (WHO). Identifying actionable determinants helps develop effective public policies (aligned with UN SDGs).
- Data: 1509 WDI variables + age-standardized suicide rates (WHO)
- Selection: Correlation filters + Maximum Independent Set β 74 variables
- Modeling: Decision Trees + Lasso Regression (5-fold CV)
- Interpretation: Importance analysis + direction of associations
# Install requirements
pip install -r requirements.txt
# Run best model (Decision Tree with 5 key variables)
python scripts/modeling/generic_regression_crossval.py --model decision_tree --mode interpretable
# Generate comparison plots
python scripts/plot_comparacao_modelos.pyβββ data/
β βββ raw/ # Original API data
β βββ interim/ # Intermediate processing
β βββ processed/ # Final datasets
βββ scripts/
β βββ data_processing/ # Extraction and processing
β βββ modeling/ # ML models
βββ results/ # Model outputs
βββ src/ # Utilities and feature selection
pip install -r requirements.txtMain dependencies: Python 3.8+, scikit-learn, pandas, numpy, matplotlib, seaborn, requests
- Systematic reduction 1509β5 interpretable variables
- Directionality analysis (decision trees)
- Reproducible pipeline for epidemiological studies
- Actionable insights for suicide prevention
- Aggregated data (ecological fallacy)
- Imputation required for all observations
- Directionality analysis could be improved (SHAP/LIME)
Luan Pereira Pinheiro and Sofia Leopoldo - University of SΓ£o Paulo
π Full Article: PDF | π Data & Code: Available in this repository
This research contributes to understanding socioeconomic factors in suicide prevention through interpretable ML, aligned with UN SDGs.

