EEbrami · Copilot · Oct 22, 2025 · Oct 22, 2025 · Oct 22, 2025
diff --git a/DART/README.md b/DART/README.md
@@ -0,0 +1,62 @@
+# DART/
+
+## Purpose
+
+DART (Data Access in Real Time) validation, analysis, and visualization outputs from LIS Key Figures. Contains scripts for comparing LISSY microdata results with official DART tables and producing median income plots.
+
+## Contents
+
+- `dart_validation.py` - Validates LISSY results against DART median income tables
+- `plot_dart_tables.py` - Generates visualizations from DART CSV tables
+- `Methodological_Notes.md` - LIS Key Figures methodology (population coverage, income concepts, equivalence scales)
+- `Methodological_Remarks.md` - Extended methodological documentation
+- `dart-table_*.csv` - DART reference tables (DHI median, poverty rates)
+- `dart_*_plot.png` - Generated visualizations
+- **MIMA/** - Moving Average workflow (detailed README inside)
+
+## Quick start
+
+**Validate LISSY vs DART:**
+```bash
+python DART/dart_validation.py
+# Outputs: dart_dhi_median_validation.csv, dart_dhi_median_error_facts.txt
+```
+
+**Plot DART tables:**
+```bash
+python DART/plot_dart_tables.py
+# Outputs: PNG plots in DART/
+```
+
+**Run MIMA workflow:**
+```bash
+python compute_mima.py \
+  --ma-number 5 \
+  --countries "Canada,Germany,Luxembourg,United Kingdom,United States" \
+  --start-year 1985 --end-year 2021 \
+  --input-path "xlsxConverted/csvFiles/dart-med-pop_decomp-dhi.csv" \
+  --output-path "DART"
+# Outputs: DART/MIMA/csv/ and DART/MIMA/visualizations/
+```
+
+See `DART/MIMA/README.md` for full MIMA documentation.
+
+## Conventions
+
+- CSV tables use countries as rows, years as columns
+- Scripts run from repository root (not from DART/ directory)
+- Validation scripts compare LISSY outputs to DART tables and report error statistics
+
+## Privacy & Secrets
+
+No microdata is stored here - only aggregated tables and validation outputs. LISSY jobs must be run separately on the LIS remote server.
+
+## Related Folders
+
+- **LISSY/DART_Validation/** - Alternative DART validation using R
+- **xlsxConverted/csvFiles/** - Source DART tables in CSV format
+- **compute_mima.py** (root) - MIMA computation script
+
+## Maintainers
+
+DART tables sourced from [LIS DART Portal](https://www.lisdatacenter.org/data-access/dart/).
diff --git a/LISSY/DART_Validation/README.md b/LISSY/DART_Validation/README.md
@@ -0,0 +1,61 @@
+# LISSY/DART_Validation/
+
+## Purpose
+
+Validates LISSY microdata analysis results against official DART aggregated tables. Compares median income and poverty rates computed from LIS microdata (via LISSY jobs) with published DART Key Figures.
+
+## Contents
+
+- `validate_lissy_vs_dart.py` - Python validation script comparing LISSY vs DART for DHI/MHI metrics
+- `R_code_steps.md` - Detailed R code methodology documentation (DART compliance steps)
+- `lissy_pop_median_*.csv` - LISSY job outputs (median income and poverty rates, PPP-adjusted)
+- `dart_table_*.csv` - DART reference tables (downloaded from LIS DART portal)
+- `comparison_*.png` - Scatter plots showing LISSY vs DART agreement
+- `error_moments_*.csv` - Error statistics (mean, std, skew, kurtosis) by country
+
+## Quick start
+
+**Run validation:**
+```bash
+cd LISSY/DART_Validation
+python validate_lissy_vs_dart.py
+```
+
+**Outputs:**
+- `comparison_*.png` - Visual comparisons (scatter plots with 45° line)
+- `error_moments_*.csv` - Statistical summaries of discrepancies
+
+## Inputs Required
+
+1. **DART tables** (already present):
+   - `dart_table_dhi_median.csv`, `dart_table_dhi_pr.csv` (DHI median/poverty rate)
+   - `dart_table_mhi_median.csv`, `dart_table_mhi_pr.csv` (MHI median/poverty rate)
+
+2. **LISSY outputs** (run LISSY jobs separately):
+   - `lissy_pop_median_dhi_ppp_median_85-21.csv`
+   - `lissy_pop_median_mhi_ppp_median_85-21.csv`
+
+   These files must be generated by running R/Stata jobs on the LISSY remote system (see `LISSY/Tutorial/` for how to submit jobs).
+
+## Conventions
+
+- LISSY outputs use long format (country, year, value columns)
+- DART tables use wide format (countries as rows, years as columns)
+- Validation compares PPP-adjusted values (2017 USD)
+- Error moments help identify systematic biases or noisy countries
+
+## Privacy & Secrets
+
+**Important:** LISSY jobs access LIS microdata under strict privacy rules. Do NOT commit microdata to this repo. Only aggregated outputs (medians, poverty rates) are stored here.
+
+See `LISSY/README.md` for LISSY registration and job submission guidelines.
+
+## Related Folders
+
+- **DART/** - Alternative Python validation (`dart_validation.py`)
+- **LISSY/Tutorial/** - LISSY onboarding and syntax examples
+- **DART/Methodological_Notes.md** - DART computation methodology
+
+## Maintainers
+
+Validation pipeline for ensuring LISSY job outputs match published DART figures.
diff --git a/LISSY/MIMA5/README.md b/LISSY/MIMA5/README.md
@@ -0,0 +1,60 @@
+# LISSY/MIMA5/
+
+## Purpose
+
+MIMA5 (5-year Moving Average of Median Income) poverty rate analysis and visualizations. Contains LISSY job outputs computing poverty rates anchored to the 5-year moving average of median income, comparing DHI and MHI across countries.
+
+## Contents
+
+- `plotting_mima5_pr.py` - Generates 4 plots comparing poverty rates and MIMA5 trends
+- `lissy_mima5_*.csv` - LISSY outputs (MIMA5-based poverty rates for DHI/MHI)
+- `lissy_CPI_mima5_*.csv` - CPI-adjusted MIMA5 poverty rates
+- `*.png` - Visualizations (poverty rates and MIMA5 time series)
+- **OLD/** - Archived outputs from previous runs
+
+## Quick start
+
+**Generate plots from existing CSV files:**
+```bash
+cd LISSY/MIMA5
+python plotting_mima5_pr.py
+```
+
+**Outputs:**
+- `mima5_dhi_50pp_pr.png` - DHI poverty rate (50% of MIMA5)
+- `mima5_mhi_50pp_pr.png` - MHI poverty rate (50% of MIMA5)
+- `mima5_dhi.png` - MIMA5 DHI time series
+- `mima5_mhi.png` - MIMA5 MHI time series
+- CPI-adjusted variants: `CPI_mima5_*.png`
+
+## Inputs Required
+
+**LISSY job outputs** (must be generated separately on LISSY):
+- `lissy_mima5_dhi_50pr.csv` - DHI poverty rate @ 50% MIMA5
+- `lissy_mima5_mhi_50pr.csv` - MHI poverty rate @ 50% MIMA5
+- `lissy_CPI_mima5_dhi_50pr.csv` - CPI-adjusted DHI
+- `lissy_CPI_mima5_mhi_50pr.csv` - CPI-adjusted MHI
+
+Run R/Stata jobs on LISSY to compute these (see `LISSY/Tutorial/` for syntax).
+
+## Conventions
+
+- CSV files use long format: `country, year, pr, mima5`
+- Plots fix country colors: Canada (green), Germany (red), UK (orange), US (blue)
+- Outputs are PNG format (300 DPI recommended for publication)
+
+## Privacy & Secrets
+
+Only aggregated poverty rates and medians are stored. **Do NOT commit LIS microdata.**
+
+LISSY jobs must be submitted via the [LISSY web interface](https://www.lisdatacenter.org/data-access/lissy/) or email. See `LISSY/README.md` for registration.
+
+## Related Folders
+
+- **DART/MIMA/** - MIMA computation workflow (using DART tables, not LISSY microdata)
+- **METIS-LIS/mima_indicator.md** - MIMA methodology documentation
+- **compute_mima.py** (root) - Python MIMA workflow for DART data
+
+## Maintainers
+
+MIMA5 analysis for poverty persistence research using LIS microdata.
diff --git a/LISSY/README.md b/LISSY/README.md
@@ -0,0 +1,73 @@
+# LISSY/
+
+## Purpose
+
+LISSY (LIS remote-execution system) documentation, tutorials, and outputs. This folder contains onboarding materials, validation scripts, and analysis results from LIS microdata jobs.
+
+## Contents
+
+- **Tutorial/** - LISSY onboarding, syntax examples, and exercises (comprehensive README inside)
+- **DART_Validation/** - Validates LISSY job outputs against DART aggregated tables
+- **MIMA5/** - MIMA5 poverty rate analysis and visualizations
+
+## Quick start
+
+**New to LISSY?** Start with the Tutorial:
+```bash
+# Read the tutorial README
+cat LISSY/Tutorial/README.md
+
+# Browse R and Stata syntax examples
+ls LISSY/Tutorial/Exercises_syntax_files-R-Part_II/
+```
+
+**Validate your LISSY results:**
+```bash
+cd LISSY/DART_Validation
+python validate_lissy_vs_dart.py
+```
+
+**Plot MIMA5 poverty rates:**
+```bash
+cd LISSY/MIMA5
+python plotting_mima5_pr.py
+```
+
+## What is LISSY?
+
+LISSY is a remote-execution system that allows researchers to access [LIS](https://www.lisdatacenter.org/) and [LWS](https://www.lisdatacenter.org/data-access/lws/) microdata while adhering to privacy restrictions. Researchers submit statistical programs (R, SAS, SPSS, Stata) through a web-based interface, and LISSY returns aggregated results.
+
+## How to Register
+
+[Register for LISSY access](https://www.lisdatacenter.org/data-access/lissy/) (1-year access, renewable annually).
+
+## Privacy & Secrets
+
+**Critical:** LIS microdata is confidential. NEVER commit microdata to this repository.
+
+- Submit jobs via the [LISSY web interface](https://www.lisdatacenter.org/data-access/lissy/)
+- Only commit **aggregated outputs** (tables, plots, summary statistics)
+- Individual-level data violates LIS terms of use
+- See [LIS Privacy Policy](https://www.lisdatacenter.org/about-lis/terms-of-use/)
+
+## Onboarding Resources
+
+- **Tutorial/** folder in this repo (syntax examples, exercises)
+- [LIS Self-Teaching Materials](https://www.lisdatacenter.org/resources/self-teaching/)
+- [METIS Documentation Portal](https://www.lisdatacenter.org/frontend)
+- [LIS FAQ](https://www.lisdatacenter.org/resources/faq/)
+- Contact: [usersupport@lisdatacenter.org](mailto:usersupport@lisdatacenter.org)
+
+## Citation
+
+All papers using LIS microdata must be submitted to the LIS Working Paper series before publication. See [General Policies](https://www.lisdatacenter.org/working-papers/#general).
+
+## Related Folders
+
+- **METIS-LIS/** - LIS codebooks and variable documentation
+- **DART/** - DART validation using aggregated tables (no microdata)
+- **compute_mima.py** (root) - MIMA workflow using DART tables
+
+## Maintainers
+
+Documentation and examples sourced from [LIS Cross-National Data Center](https://www.lisdatacenter.org/).
diff --git a/METIS-LIS/README.md b/METIS-LIS/README.md
@@ -0,0 +1,41 @@
+# METIS-LIS/
+
+## Purpose
+
+Documentation and metadata for LIS (Luxembourg Income Study) datasets, including codebooks, MIMA indicator definitions, and wave/date mappings.
+
+## Contents
+
+- `codebook.pdf` - LIS variable codebook (names, definitions, codes)
+- `mima_indicator.md` - MIMA (Median Income Moving Average) indicator methodology
+- `waves-and-dates.md` - LIS data collection waves and reference dates
+
+## Quick start
+
+**View codebook:**
+```bash
+open METIS-LIS/codebook.pdf  # macOS
+xdg-open METIS-LIS/codebook.pdf  # Linux
+```
+
+**Review MIMA methodology:**
+```bash
+cat METIS-LIS/mima_indicator.md
+```
+
+## Conventions
+
+- Files are reference documentation (read-only)
+- PDF codebook is the authoritative source for LIS variable definitions
+- Markdown files provide concise summaries for quick reference
+
+## Related Resources
+
+- [LIS METIS Portal](https://www.lisdatacenter.org/frontend) - Full online documentation
+- [LIS Database](https://www.lisdatacenter.org/) - Official LIS homepage
+- **DART/MIMA/** - Implementation of MIMA methodology
+- **LISSY/** - Remote execution system for LIS microdata
+
+## Maintainers
+
+Documentation sourced from [LIS Cross-National Data Center](https://www.lisdatacenter.org/).
diff --git a/README.md b/README.md
@@ -27,3 +27,29 @@ This project serves to:
 * **Inform Policy Evaluation:** Offer policymakers a tool for evidence-based assessment of poverty alleviation programs and policies.
 * **Enable Comparative Studies:** Facilitate cross-national and cross-temporal comparisons of poverty and the effectiveness of different policy interventions.
 * **Promote Data-Driven Decision-Making:** Support strategic decisions regarding resource allocation and policy design in the global effort to reduce poverty.
+
+## Repository Map
+
+Navigate the repository structure using the links below. Each folder contains a README with purpose, quick start commands, and conventions.
+
+| Folder | Description | Link |
+|--------|-------------|------|
+| **DART/** | DART validation, MIMA workflow, and methodological notes | [DART/README.md](DART/README.md) |
+| **LISSY/** | LISSY remote-execution system documentation, tutorials, and outputs | [LISSY/README.md](LISSY/README.md) |
+| **LISSY/Tutorial/** | LISSY onboarding materials and syntax examples (R, Stata) | [LISSY/Tutorial/README.md](LISSY/Tutorial/README.md) |
+| **LISSY/DART_Validation/** | Validation pipeline comparing LISSY vs DART results | [LISSY/DART_Validation/README.md](LISSY/DART_Validation/README.md) |
+| **LISSY/MIMA5/** | MIMA5 poverty rate analysis and visualizations | [LISSY/MIMA5/README.md](LISSY/MIMA5/README.md) |
+| **METIS-LIS/** | LIS codebooks, MIMA indicator docs, and wave/date mappings | [METIS-LIS/README.md](METIS-LIS/README.md) |
+| **analysis/** | Parent folder for analytical pipelines | [analysis/README.md](analysis/README.md) |
+| **analysis/data-availability/** | Submatrix analysis for optimal country-year panels | [analysis/data-availability/README.md](analysis/data-availability/README.md) |
+| **scripts/** | Utility scripts (HTML-to-Markdown converter) | [scripts/README.md](scripts/README.md) |
+| **xlsxFiles/** | Source Excel data files (DART tables, codebooks) | [xlsxFiles/README.md](xlsxFiles/README.md) |
+| **xlsxConverted/** | Auto-generated CSV/JSON/Markdown outputs | [xlsxConverted/README.md](xlsxConverted/README.md) |
+| **docs/** | Project documentation and reference materials | [docs/README.md](docs/README.md) |
+
+### Key Entry Points
+
+- **New to LIS/LISSY?** Start with [LISSY/Tutorial/README.md](LISSY/Tutorial/README.md)
+- **Run MIMA workflow:** See [DART/MIMA/README.md](DART/MIMA/README.md) and `compute_mima.py`
+- **Validate DART data:** See [DART/README.md](DART/README.md) and [LISSY/DART_Validation/README.md](LISSY/DART_Validation/README.md)
+- **Analyze data availability:** See [analysis/data-availability/README.md](analysis/data-availability/README.md)
diff --git a/analysis/README.md b/analysis/README.md
@@ -0,0 +1,41 @@
+# analysis/
+
+## Purpose
+
+Parent folder for analytical pipelines and research modules. Each subfolder contains a self-contained analysis with its own README, requirements, and outputs.
+
+## Contents
+
+- **data-availability/** - Submatrix analysis finding optimal country-year panels in OECD income data
+
+## Quick start
+
+Navigate to specific analysis folders for detailed instructions:
+
+```bash
+# Run data availability analysis
+cd analysis/data-availability
+python run.py
+```
+
+See individual folder READMEs for requirements, inputs, and outputs.
+
+## Conventions
+
+- Each analysis subfolder is **self-contained** with its own `requirements.txt`
+- Analysis scripts run from the **repository root** (not from analysis/)
+- Use Python virtual environments to isolate dependencies
+- Large outputs (plots, JSON results) may be gitignored - check folder READMEs for regeneration steps
+
+## Adding New Analyses
+
+1. Create subfolder: `analysis/my-analysis/`
+2. Add `README.md` documenting purpose, inputs, outputs, and commands
+3. Add `requirements.txt` if Python dependencies are needed
+4. Include example run command in README
+5. Update this parent README to list the new analysis
+
+## Related Folders
+
+- **xlsxConverted/csvFiles/** - Common input source for many analyses
+- **DART/** - DART-specific analysis and validation