Author: Mattias Antar | University: Linköping University | Supervisors: Adel Daoud & Connor Jerzak | Year: 2025/2026
This repository contains the complete replication code, data processing pipelines, and analytical datasets used to generate the findings for my Master's thesis.
This thesis investigates: To what extent do World Bank and Chinese aid projects causally influence neighborhood-level wealth trajectories across African countries? As China emerges as a major donor with a distinct "non-interference" approach compared to the World Bank's conditionality-based model, understanding their comparative effectiveness at the local level is of importance to understand development aid to the continent.
The study overcomes the scarcity of local economic data by leveraging satellite-derived International Wealth Index (IWI) estimates, provided in tabular form at the DHS-cluster level, to construct a high-resolution longitudinal panel. To estimate causal effects, the analysis employs a Difference-in-Differences (DiD) framework. It contrasts the conventional Two-Way Fixed Effects (TWFE) model with the more robust de Chaisemartin & d'Haultfoeuille (dCdH) estimator to correct for biases introduced by staggered treatment timing and heterogeneous effects.
The analysis relies on panel datasets located in
the Data/Archive_enriched/ directory. These files integrate high-precision geocoded aid data with satellite-based IWI wealth measures and contextual covariates at the DHS-cluster level, where each DHS cluster is interpreted as representing a neighborhood.
-
Treatment: Geocoded aid projects from AidData (World Bank v1.4.2 and Global Chinese Development Finance v1.1.1). Only projects with precision codes 1-3 (exact locations, buffered locations, or administrative-level centroids) were included to ensure that treatment exposure can be meaningfully defined at the same local neighborhood scale as the satellite-derived IWI wealth outcome.
-
Outcome: The International Wealth Index (IWI), a continuous asset-based measure of household wealth (0-100) derived from satellite imagery using deep learning models trained on DHS survey data (Pettersson et al., 2023).
To ensure robustness, the analysis included the following covariates in separate DiD runs:
-
log_avg_pop_dens: Log of average population density at the local level, serving as a proxy for urbanization. -
log_3yr_pre_conflict_deaths: Log of battle related deaths (from UCDP) in the surrounding administrative region during the previous three years, capturing exposure to violence and donor responsiveness to instability. -
log_disasters: Log count of natural disasters (e.g., floods, droughts). -
election_year: A binary indicator of whether a national executive election occurred in the pre-project period, capturing political cycles and potential strategic allocation of aid. -
political_stability: A World Bank Governance Indicator measuring perceptions of political stability and absence of violence. -
leader_birthplace: A Binary indicator (1 if the cluster is in the national leader's home region) for whether a neighborhood lies within the executive leader’s home region, included to adjust for documented political capture and favoritism in aid distribution.
The data is split into 23 distinct CSV files, where each file represents a unique Donor-Sector Panel. This allows for granular analysis of specific aid types.
File Naming Convention: InputData_{FUNDER}_{SECTOR_CODE}_DiD_enriched.csv
InputData_ch_110_DiD_enriched.csv→ China (ch) aid in the Education (110) sectorInputData_wb_120_DiD_enriched.csv→ World Bank (wb) aid in the Health (120) sector
The repository is organized into four main components that mirror the analytical workflow: Input, Estimation, Robustness, and Synthesis.
thesis_code/
├── Data/
│ └── Archive_enriched/ # 23 panel datasets
│ ├── InputData_{funder}_{sector}_DiD_enriched.csv (×23)
│ └── sector_group_names.csv # Sector code lookup
├── R/
│ ├── TWFE/ # Two-Way Fixed Effects models
│ │ ├── TWFE.R # Baseline (no covariates)
│ │ └── TWFE_with_covariates.R # With covariates
│ ├── dCdH/
│ │ └── dcdh_with_covariates.R # dCdH estimator (with covariates)
│ ├── Robustness/
│ │ └── Robustness.R # Spatial spillover checks
│ └── Plots and Tables/
│ ├── THESISTABLES.R # Generate tables (HTML)
│ ├── plotandtables.R # Generate figures and additional tables
│ └── Output/ # Tables
├── Stata/
│ ├── dCdH.do # Baseline dCdH in STATA (no covariates)
│ └── dcdh_results/
│ ├── DiD_Table_{funder}_{sector}.csv (×23) # Individual panel results
│ └── dcdh_combined_all_panels.csv # Combined Stata results
└── output/
├── dCdH/
│ ├── dcdh_results.csv # R dCdH with covariates (23 panels)
│ └── dcdh_no_covariates_results.csv # R dCdH without covariates
├── TWFE/
│ ├── without_covariates/
│ │ ├── tables/
│ │ ├── plots/
│ │ └── twfe_combined_all_panels.csv
│ └── with_covariates/
│ ├── tables/
│ ├── graphs/
│ └── twfe_with_cov_combined_all_panels.csv
└── spillover/ # Robustness analysis output
├── Table1_band_tidy.csv
├── Table2_buffer_results.csv
└── Table3_Moran_summary.csv
| Object | Output File | Description |
|---|---|---|
| dCdH | Stata/dcdh_results/dcdh_combined_all_panels.csv |
Stata, no covariates |
| dCdH | output/dCdH/dcdh_results.csv |
R, with 6 covariates |
| TWFE (no cov) | output/TWFE/without_covariates/twfe_combined_all_panels.csv |
R, no covariates |
| TWFE (with cov) | output/TWFE/with_covariates/twfe_with_cov_combined_all_panels.csv |
R, with 6 covariates |
| Robustness | output/spillover/*.csv |
Spatial spillover analysis |
| Tables | R/Plots and Tables/Output/*.html |
Final publication tables |
The analysis follows a reproducible pipeline, including robustness checks.
Script: Stata/dCdH.do
- Purpose: Runs the dCdH DiD model without covariates.
- Details: Loops through all panel datasets and applies the dCdH estimator without covariates.
- Output: Combined results in
Stata/dcdh_results/dcdh_combined_all_panels.csv
This step establishes the baseline for comparison and tests if results hold when controlling for the 6 covariates.
-
TWFE (Baseline & Covariates Adjusted):
R/TWFE/TWFE.R: Standard Two-Way Fixed Effects event-study modelR/TWFE/TWFE_with_covariates.R: TWFE with 6 covariates
-
dCdH (Robustness):
R/dCdH/dcdh_with_covariates.R: dCdH estimator with 6 covariates usingDIDmultiplegtDYN
Script: R/Robustness/Robustness.R
- Purpose: Provides additional diagnostic checks for spatial spillovers in the context of the thesis.
- Diagnostic Tests:
- Distance-Based Dose-Response: Tests if IWI gradients fade with distance from projects
- Exclusion Buffer: Re-estimates effects while removing control units within 10-30km of treated sites
- Moran's I: Tests for spatial autocorrelation in model residuals
Scripts: R/Plots and Tables/
THESISTABLES.R: Usesgtpackage to create HTML summary tables (Tables 1-4, A1-A7)plotandtables.R: Generates visualizations and tables- Output Location:
R/Plots and Tables/Output/
To replicate this analysis, you will need R and Stata installed.
The project uses the here package to manage relative paths.
install.packages(c(
"fixest",
"DIDmultiplegtDYN",
"tidyverse",
"janitor",
"sf",
"spdep",
"rnaturalearth",
"gt",
"here",
"pacman"
))ssc install did_multiplegt_dyn, replace
ssc install event_plot, replace
ssc install estout, replace
ssc install fs, replaceTo reproduce the thesis results, recommended execution order:
-
Run dCdH Baseline (Stata):
- Open
Stata/dCdH.doand execute
- Open
-
Run TWFE & Diagnostics (R):
- Run
R/TWFE/TWFE.R(Baseline) - Run
R/TWFE/TWFE_with_covariates.R
- Run
-
Run dCdH with Covariates (R):
- Run
R/dCdH/dcdh_with_covariates.R
- Run
-
Run Robustness Checks (R):
- Run
R/Robustness/Robustness.R
- Run
-
Compile Final Outputs (R):
- Run
R/Plots and Tables/THESISTABLES.Rto generate tables - Run
R/Plots and Tables/plotandtables.Rto generate figures and additional tables
- Run