Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions DART/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# DART/

## Purpose

DART (Data Access in Real Time) validation, analysis, and visualization outputs from LIS Key Figures. Contains scripts for comparing LISSY microdata results with official DART tables and producing median income plots.

## Contents

- `dart_validation.py` - Validates LISSY results against DART median income tables
- `plot_dart_tables.py` - Generates visualizations from DART CSV tables
- `Methodological_Notes.md` - LIS Key Figures methodology (population coverage, income concepts, equivalence scales)
- `Methodological_Remarks.md` - Extended methodological documentation
- `dart-table_*.csv` - DART reference tables (DHI median, poverty rates)
- `dart_*_plot.png` - Generated visualizations
- **MIMA/** - Moving Average workflow (detailed README inside)

## Quick start

**Validate LISSY vs DART:**
```bash
python DART/dart_validation.py
# Outputs: dart_dhi_median_validation.csv, dart_dhi_median_error_facts.txt
```

**Plot DART tables:**
```bash
python DART/plot_dart_tables.py
# Outputs: PNG plots in DART/
```

**Run MIMA workflow:**
```bash
python compute_mima.py \
--ma-number 5 \
--countries "Canada,Germany,Luxembourg,United Kingdom,United States" \
--start-year 1985 --end-year 2021 \
--input-path "xlsxConverted/csvFiles/dart-med-pop_decomp-dhi.csv" \
--output-path "DART"
# Outputs: DART/MIMA/csv/ and DART/MIMA/visualizations/
```

See `DART/MIMA/README.md` for full MIMA documentation.

## Conventions

- CSV tables use countries as rows, years as columns
- Scripts run from repository root (not from DART/ directory)
- Validation scripts compare LISSY outputs to DART tables and report error statistics

## Privacy & Secrets

No microdata is stored here - only aggregated tables and validation outputs. LISSY jobs must be run separately on the LIS remote server.

## Related Folders

- **LISSY/DART_Validation/** - Alternative DART validation using R
- **xlsxConverted/csvFiles/** - Source DART tables in CSV format
- **compute_mima.py** (root) - MIMA computation script

## Maintainers

DART tables sourced from [LIS DART Portal](https://www.lisdatacenter.org/data-access/dart/).
61 changes: 61 additions & 0 deletions LISSY/DART_Validation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# LISSY/DART_Validation/

## Purpose

Validates LISSY microdata analysis results against official DART aggregated tables. Compares median income and poverty rates computed from LIS microdata (via LISSY jobs) with published DART Key Figures.

## Contents

- `validate_lissy_vs_dart.py` - Python validation script comparing LISSY vs DART for DHI/MHI metrics
- `R_code_steps.md` - Detailed R code methodology documentation (DART compliance steps)
- `lissy_pop_median_*.csv` - LISSY job outputs (median income and poverty rates, PPP-adjusted)
- `dart_table_*.csv` - DART reference tables (downloaded from LIS DART portal)
- `comparison_*.png` - Scatter plots showing LISSY vs DART agreement
- `error_moments_*.csv` - Error statistics (mean, std, skew, kurtosis) by country

## Quick start

**Run validation:**
```bash
cd LISSY/DART_Validation
python validate_lissy_vs_dart.py
```

**Outputs:**
- `comparison_*.png` - Visual comparisons (scatter plots with 45° line)
- `error_moments_*.csv` - Statistical summaries of discrepancies

## Inputs Required

1. **DART tables** (already present):
- `dart_table_dhi_median.csv`, `dart_table_dhi_pr.csv` (DHI median/poverty rate)
- `dart_table_mhi_median.csv`, `dart_table_mhi_pr.csv` (MHI median/poverty rate)

2. **LISSY outputs** (run LISSY jobs separately):
- `lissy_pop_median_dhi_ppp_median_85-21.csv`
- `lissy_pop_median_mhi_ppp_median_85-21.csv`

These files must be generated by running R/Stata jobs on the LISSY remote system (see `LISSY/Tutorial/` for how to submit jobs).

## Conventions

- LISSY outputs use long format (country, year, value columns)
- DART tables use wide format (countries as rows, years as columns)
- Validation compares PPP-adjusted values (2017 USD)
- Error moments help identify systematic biases or noisy countries

## Privacy & Secrets

**Important:** LISSY jobs access LIS microdata under strict privacy rules. Do NOT commit microdata to this repo. Only aggregated outputs (medians, poverty rates) are stored here.

See `LISSY/README.md` for LISSY registration and job submission guidelines.

## Related Folders

- **DART/** - Alternative Python validation (`dart_validation.py`)
- **LISSY/Tutorial/** - LISSY onboarding and syntax examples
- **DART/Methodological_Notes.md** - DART computation methodology

## Maintainers

Validation pipeline for ensuring LISSY job outputs match published DART figures.
60 changes: 60 additions & 0 deletions LISSY/MIMA5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# LISSY/MIMA5/

## Purpose

MIMA5 (5-year Moving Average of Median Income) poverty rate analysis and visualizations. Contains LISSY job outputs computing poverty rates anchored to the 5-year moving average of median income, comparing DHI and MHI across countries.

## Contents

- `plotting_mima5_pr.py` - Generates 4 plots comparing poverty rates and MIMA5 trends
- `lissy_mima5_*.csv` - LISSY outputs (MIMA5-based poverty rates for DHI/MHI)
- `lissy_CPI_mima5_*.csv` - CPI-adjusted MIMA5 poverty rates
- `*.png` - Visualizations (poverty rates and MIMA5 time series)
- **OLD/** - Archived outputs from previous runs

## Quick start

**Generate plots from existing CSV files:**
```bash
cd LISSY/MIMA5
python plotting_mima5_pr.py
```

**Outputs:**
- `mima5_dhi_50pp_pr.png` - DHI poverty rate (50% of MIMA5)
- `mima5_mhi_50pp_pr.png` - MHI poverty rate (50% of MIMA5)
- `mima5_dhi.png` - MIMA5 DHI time series
- `mima5_mhi.png` - MIMA5 MHI time series
- CPI-adjusted variants: `CPI_mima5_*.png`

## Inputs Required

**LISSY job outputs** (must be generated separately on LISSY):
- `lissy_mima5_dhi_50pr.csv` - DHI poverty rate @ 50% MIMA5
- `lissy_mima5_mhi_50pr.csv` - MHI poverty rate @ 50% MIMA5
- `lissy_CPI_mima5_dhi_50pr.csv` - CPI-adjusted DHI
- `lissy_CPI_mima5_mhi_50pr.csv` - CPI-adjusted MHI

Run R/Stata jobs on LISSY to compute these (see `LISSY/Tutorial/` for syntax).

## Conventions

- CSV files use long format: `country, year, pr, mima5`
- Plots fix country colors: Canada (green), Germany (red), UK (orange), US (blue)
- Outputs are PNG format (300 DPI recommended for publication)

## Privacy & Secrets

Only aggregated poverty rates and medians are stored. **Do NOT commit LIS microdata.**

LISSY jobs must be submitted via the [LISSY web interface](https://www.lisdatacenter.org/data-access/lissy/) or email. See `LISSY/README.md` for registration.

## Related Folders

- **DART/MIMA/** - MIMA computation workflow (using DART tables, not LISSY microdata)
- **METIS-LIS/mima_indicator.md** - MIMA methodology documentation
- **compute_mima.py** (root) - Python MIMA workflow for DART data

## Maintainers

MIMA5 analysis for poverty persistence research using LIS microdata.
73 changes: 73 additions & 0 deletions LISSY/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# LISSY/

## Purpose

LISSY (LIS remote-execution system) documentation, tutorials, and outputs. This folder contains onboarding materials, validation scripts, and analysis results from LIS microdata jobs.

## Contents

- **Tutorial/** - LISSY onboarding, syntax examples, and exercises (comprehensive README inside)
- **DART_Validation/** - Validates LISSY job outputs against DART aggregated tables
- **MIMA5/** - MIMA5 poverty rate analysis and visualizations

## Quick start

**New to LISSY?** Start with the Tutorial:
```bash
# Read the tutorial README
cat LISSY/Tutorial/README.md

# Browse R and Stata syntax examples
ls LISSY/Tutorial/Exercises_syntax_files-R-Part_II/
```

**Validate your LISSY results:**
```bash
cd LISSY/DART_Validation
python validate_lissy_vs_dart.py
```

**Plot MIMA5 poverty rates:**
```bash
cd LISSY/MIMA5
python plotting_mima5_pr.py
```

## What is LISSY?

LISSY is a remote-execution system that allows researchers to access [LIS](https://www.lisdatacenter.org/) and [LWS](https://www.lisdatacenter.org/data-access/lws/) microdata while adhering to privacy restrictions. Researchers submit statistical programs (R, SAS, SPSS, Stata) through a web-based interface, and LISSY returns aggregated results.

## How to Register

[Register for LISSY access](https://www.lisdatacenter.org/data-access/lissy/) (1-year access, renewable annually).

## Privacy & Secrets

**Critical:** LIS microdata is confidential. NEVER commit microdata to this repository.

- Submit jobs via the [LISSY web interface](https://www.lisdatacenter.org/data-access/lissy/)
- Only commit **aggregated outputs** (tables, plots, summary statistics)
- Individual-level data violates LIS terms of use
- See [LIS Privacy Policy](https://www.lisdatacenter.org/about-lis/terms-of-use/)

## Onboarding Resources

- **Tutorial/** folder in this repo (syntax examples, exercises)
- [LIS Self-Teaching Materials](https://www.lisdatacenter.org/resources/self-teaching/)
- [METIS Documentation Portal](https://www.lisdatacenter.org/frontend)
- [LIS FAQ](https://www.lisdatacenter.org/resources/faq/)
- Contact: [usersupport@lisdatacenter.org](mailto:usersupport@lisdatacenter.org)

## Citation

All papers using LIS microdata must be submitted to the LIS Working Paper series before publication. See [General Policies](https://www.lisdatacenter.org/working-papers/#general).

## Related Folders

- **METIS-LIS/** - LIS codebooks and variable documentation
- **DART/** - DART validation using aggregated tables (no microdata)
- **compute_mima.py** (root) - MIMA workflow using DART tables

## Maintainers

Documentation and examples sourced from [LIS Cross-National Data Center](https://www.lisdatacenter.org/).
41 changes: 41 additions & 0 deletions METIS-LIS/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# METIS-LIS/

## Purpose

Documentation and metadata for LIS (Luxembourg Income Study) datasets, including codebooks, MIMA indicator definitions, and wave/date mappings.

## Contents

- `codebook.pdf` - LIS variable codebook (names, definitions, codes)
- `mima_indicator.md` - MIMA (Median Income Moving Average) indicator methodology
- `waves-and-dates.md` - LIS data collection waves and reference dates

## Quick start

**View codebook:**
```bash
open METIS-LIS/codebook.pdf # macOS
xdg-open METIS-LIS/codebook.pdf # Linux
```

**Review MIMA methodology:**
```bash
cat METIS-LIS/mima_indicator.md
```

## Conventions

- Files are reference documentation (read-only)
- PDF codebook is the authoritative source for LIS variable definitions
- Markdown files provide concise summaries for quick reference

## Related Resources

- [LIS METIS Portal](https://www.lisdatacenter.org/frontend) - Full online documentation
- [LIS Database](https://www.lisdatacenter.org/) - Official LIS homepage
- **DART/MIMA/** - Implementation of MIMA methodology
- **LISSY/** - Remote execution system for LIS microdata

## Maintainers

Documentation sourced from [LIS Cross-National Data Center](https://www.lisdatacenter.org/).
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,29 @@ This project serves to:
* **Inform Policy Evaluation:** Offer policymakers a tool for evidence-based assessment of poverty alleviation programs and policies.
* **Enable Comparative Studies:** Facilitate cross-national and cross-temporal comparisons of poverty and the effectiveness of different policy interventions.
* **Promote Data-Driven Decision-Making:** Support strategic decisions regarding resource allocation and policy design in the global effort to reduce poverty.

## Repository Map

Navigate the repository structure using the links below. Each folder contains a README with purpose, quick start commands, and conventions.

| Folder | Description | Link |
|--------|-------------|------|
| **DART/** | DART validation, MIMA workflow, and methodological notes | [DART/README.md](DART/README.md) |
| **LISSY/** | LISSY remote-execution system documentation, tutorials, and outputs | [LISSY/README.md](LISSY/README.md) |
| **LISSY/Tutorial/** | LISSY onboarding materials and syntax examples (R, Stata) | [LISSY/Tutorial/README.md](LISSY/Tutorial/README.md) |
| **LISSY/DART_Validation/** | Validation pipeline comparing LISSY vs DART results | [LISSY/DART_Validation/README.md](LISSY/DART_Validation/README.md) |
| **LISSY/MIMA5/** | MIMA5 poverty rate analysis and visualizations | [LISSY/MIMA5/README.md](LISSY/MIMA5/README.md) |
| **METIS-LIS/** | LIS codebooks, MIMA indicator docs, and wave/date mappings | [METIS-LIS/README.md](METIS-LIS/README.md) |
| **analysis/** | Parent folder for analytical pipelines | [analysis/README.md](analysis/README.md) |
| **analysis/data-availability/** | Submatrix analysis for optimal country-year panels | [analysis/data-availability/README.md](analysis/data-availability/README.md) |
| **scripts/** | Utility scripts (HTML-to-Markdown converter) | [scripts/README.md](scripts/README.md) |
| **xlsxFiles/** | Source Excel data files (DART tables, codebooks) | [xlsxFiles/README.md](xlsxFiles/README.md) |
| **xlsxConverted/** | Auto-generated CSV/JSON/Markdown outputs | [xlsxConverted/README.md](xlsxConverted/README.md) |
| **docs/** | Project documentation and reference materials | [docs/README.md](docs/README.md) |

### Key Entry Points

- **New to LIS/LISSY?** Start with [LISSY/Tutorial/README.md](LISSY/Tutorial/README.md)
- **Run MIMA workflow:** See [DART/MIMA/README.md](DART/MIMA/README.md) and `compute_mima.py`
- **Validate DART data:** See [DART/README.md](DART/README.md) and [LISSY/DART_Validation/README.md](LISSY/DART_Validation/README.md)
- **Analyze data availability:** See [analysis/data-availability/README.md](analysis/data-availability/README.md)
41 changes: 41 additions & 0 deletions analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# analysis/

## Purpose

Parent folder for analytical pipelines and research modules. Each subfolder contains a self-contained analysis with its own README, requirements, and outputs.

## Contents

- **data-availability/** - Submatrix analysis finding optimal country-year panels in OECD income data

## Quick start

Navigate to specific analysis folders for detailed instructions:

```bash
# Run data availability analysis
cd analysis/data-availability
python run.py
```

See individual folder READMEs for requirements, inputs, and outputs.

## Conventions

- Each analysis subfolder is **self-contained** with its own `requirements.txt`
- Analysis scripts run from the **repository root** (not from analysis/)
- Use Python virtual environments to isolate dependencies
- Large outputs (plots, JSON results) may be gitignored - check folder READMEs for regeneration steps

## Adding New Analyses

1. Create subfolder: `analysis/my-analysis/`
2. Add `README.md` documenting purpose, inputs, outputs, and commands
3. Add `requirements.txt` if Python dependencies are needed
4. Include example run command in README
5. Update this parent README to list the new analysis

## Related Folders

- **xlsxConverted/csvFiles/** - Common input source for many analyses
- **DART/** - DART-specific analysis and validation
Loading