Replication materials for paper Glyphosate exposure and GM seed rollout unequally reduced perinatal health
We run the analysis using make version 4.4.1 and R version 4.4.1. We use renv to manage packages. To get started, install renv and run renv::restore() to download and install the package versions used in this project, which are recorded in the renv.lock file.
make data-clean will generate all of the intermediate files we need for the analysis, which takes about 4 minutes to run. This does not run two categories of targets: downloading data and the water ML pipeline. Those both take a while to run, along with the data we downloaded manually---they are grouped together in the data/download-manual, data/download-script, and data/watershed directories.
make data-raw will create all of the files in the data/raw directory, make data-clean creates all of the files in the data/clean directory, make water creates all of the files in the data/watershed directory, and make data-download creates all of the files in the data/download-script directory.
For analysis:
make desc-figswill create the descriptive time series plots and maps, as well as some appendix figuresmake cnty-resultsruns the county level analysis (event studies, DiD, TSLS for many outcomes and different treatment vars) and creates the event study figures. It also does the Ag district level analysismake predict-bwruns scripts to train birthweight prediction models and generate predictions for each birthmake micro-modsruns the birth-level analysismake micro-resultscreates figures from the birth-level analysis
The following API keys are required, save them to .Renviron with with usethis::edit_r_environ()
- USDA QuickStats API saved in .Renviron as
NASS_KEY - Census API saved in .Renviron as
CENSUS_KEY - BEA API saved in .Renvoron as
BEA_KEY
Most data required for our analysis is included in this repository, but some files are too large or not allowed to be shared publicly.
Instructions on how to get access to the restricted Births (Natality) and Deaths (Mortality) files are on the NCHS website. Our primary analysis uses the natality files between 1990 and 2013. We do supplemental analysis that uses the mortality files over the same time period. Once obtained, the raw natality and mortality files go into: data/health-restricted/raw.
Download the HydroBASINS data for North America and copy the contents into data/watersheds/hydrobasins.
There are two pieces of data from the USGS's gridded soil survey needed for water analysis, which the USGS hosts on Box here:
- Download
MUKEY Grids (TIF)/FY2021_gNATSGO_mukey_grid.zipand place the contents here:data/watersheds/soil-quality/gNATSGO_mukey_grid - Download
MUKEY Grids (TIF)/FY2021_gNATSGO_Tabular_CSV.zipand place the contents here:data/watersheds/soil-quality/gNATSGO_Tabular_CSV
GAEZ data for Attainable Yield is in data/download-manual/attainable-yield/. These can also be downloaded using the following links:
- Soy high: res05/CRUTS32/Hist/8110H/ylHr_soy.tif
- Soy low: res05/CRUTS32/Hist/8110H/ylLr_soy.tif
- Corn high: res05/CRUTS32/Hist/8110H/ylHr_mze.tif
- Corn low: res05/CRUTS32/Hist/8110H/ylLr_mze.tif
- Cotton high: res05/CRUTS32/Hist/8110H/ylHr_cot.tif
- Cotton low: res05/CRUTS32/Hist/8110H/ylLr_cot.tif
The USDA Agriculutral Statistic District to County FIPS crosswalk is here.
SEER U.S. County Population Data: Single-age adjusted files starting in 1969 and 1990 (i.e., us.1969_2022.singleages.adjusted.txt and us.1990_2022.singleages.adjusted.txt). These files go into the data/download-manual directory.