diff --git a/README.md b/README.md index d59f8ee..b693107 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,97 @@ This project, undertaken in collaboration with Dr. Miles Corak, examines the pro * **Analyze Poverty Trends:** Conduct in-depth, cross-national analyses of poverty trends, drawing on the harmonized LIS microdata. * **Evaluate Policy Impacts:** Assess the impact of various social and economic policies on poverty, using our standardized dataset to draw meaningful comparisons. +## Repository Structure + +``` +├── DART/ # DART validation and methodological notes +│ ├── MIMA/ # MIMA-related CSV data and visualizations +│ └── Methodological_Notes.md # Documentation of methodology +├── LISSY/ # Core LIS analysis and validation +│ ├── Official_pr_Analysis/ # Main poverty rate analysis pipeline (see below) +│ ├── DART_Validation/ # Validation against DART tables +│ ├── MBM_validation/ # Market Basket Measure validation +│ ├── MIMA5/ # MIMA-5 specific analysis +│ └── Tutorial/ # Tutorial materials +├── METIS-LIS/ # METIS-LIS integration and codebooks +├── analysis/ # Data availability analysis +├── docs/ # Project documentation +├── present-Nov21/ # Presentation materials (2000, 2008, 2018 base years) +├── scripts/ # Utility scripts (e.g., HTML to MD conversion) +├── xlsxConverted/ # Converted Excel files (CSV, JSON, MD formats) +├── xlsxFiles/ # Original Excel source files +├── compute_mima.py # Core MIMA computation script +├── convert_excel.py # Excel file conversion utility +└── USAGE_GUIDE.md # Usage guide for the project +``` + +## Official_pr_Analysis Folder + +The `LISSY/Official_pr_Analysis/` folder contains the **official poverty rate analysis pipeline** — the core analytical workflow for computing and validating poverty rates across multiple countries using Luxembourg Income Study (LIS) microdata. + +### Purpose + +This folder implements the **MIMA (Median Income Moving Average)** methodology for calculating poverty rates and validates these calculations against official government benchmarks from multiple countries. + +### Structure + +``` +Official_pr_Analysis/ +├── benchmarks/ # Official poverty rate benchmarks by country +│ ├── ca/ # Canada (Market Basket Measure benchmarks) +│ ├── de/ # Germany (Armutsgefährdungsquoten data) +│ ├── eu/ # European Union (EU-SILC methodology) +│ ├── uk/ # United Kingdom (HBAI statistics) +│ └── us/ # United States (Census Bureau poverty rates) +├── lissy_data/ # LIS microdata outputs and processing +│ ├── _SCRIPTS/ # Job parsing scripts (parse_lissy_job.py) +│ ├── ca/, de/, uk/, us/ # Country-specific LIS job outputs +│ ├── mima_algorithm_explanation.md # Detailed MIMA algorithm documentation +│ └── parse_lissy_job_guide.md # Guide for parsing LIS job logs +├── results/ # Analysis outputs and visualizations +│ ├── ca/ # Canada results (CSV files and plots) +│ └── us/ # United States results +└── scripts/ # Analysis and visualization scripts + ├── run_analysis.py # Main runner for optimization and visualization + ├── run_multi_benchmark_analysis.py # Multi-benchmark comparison runner + ├── parse_lis_output.py # Parser for LIS output files + ├── calculate_and_plot_mima_from_csv.py # MIMA calculation from CSV + ├── plot_mima_difference.py # MIMA difference visualization + ├── plot_npoor_analysis.py # Number of poor analysis plots + ├── plot_specific_rates.py # Specific rate plotting utilities + ├── plot_us_vs_benchmarks.py # US vs benchmark comparison plots + └── single_file_optimize.py # Single-file optimization script +``` + +### Workflow + +1. **Data Extraction**: R scripts run on the LIS LISSY platform to extract household-level data with income variables (`dhi`, `mhi`), household weights (`hpopwgt`), and demographic variables. + +2. **Job Parsing**: The `parse_lissy_job.py` script automatically parses LIS job outputs, extracting CSV data from log files and saving them with dynamic filenames. + +3. **MIMA Calculation**: The algorithm computes the Median Income Moving Average using a configurable window size and calculates poverty lines as a fraction (α) of MIMA. + +4. **Optimization**: Scripts sweep across parameter combinations (α = alpha, w = window size) to find optimal settings that best match official government poverty rate benchmarks. + +5. **Visualization**: Multiple visualization scripts generate comparative plots showing calculated poverty rates against official benchmarks. + +### Key Algorithm (MIMA) + +The MIMA algorithm (documented in `mima_algorithm_explanation.md`) calculates poverty rates using: + +- **Equivalized Income**: Household income adjusted by square root of household size +- **Moving Average Median**: Trailing average of annual median incomes over w years +- **Poverty Line**: α × MIMA (where α typically ranges from 0.4 to 0.65) +- **Poverty Rate**: Proportion of population below the poverty line, weighted by person-level population weights + +### Countries Analyzed + +- **Canada (ca)**: Validated against Market Basket Measure (MBM) data +- **United States (us)**: Validated against Census Bureau historical poverty rates +- **United Kingdom (uk)**: Validated against Households Below Average Income (HBAI) statistics +- **Germany (de)**: Validated against Armutsgefährdungsquoten (at-risk-of-poverty rate) data +- **European Union (eu)**: EU-SILC methodology reference + ## Methodology TBD