This is the code for a bachelor thesis titled "Statistical analysis of population changes in the Lombardy region in relation to income and housing prices" by G. Caprotti and S. Zapperi.
The project is structured as a data pipeline, running on the targets framework.
Please refer to the targets package documentation for a thorough description of the framework.
The pipeline depends on the raw data in the data folder and exports datasets in the export folder and figures in the figures folder.
The following R packages need to be installed:
tidyverse
sf
areal
ggplot2
corrplot
rmapshaper
sqids
box
scales
cowplot
purrr
Due to licensing, the Eurostat GISCO grid used to compute population estimates cannot be provided.
The pipeline expects such a grid to be provided as data/census.gpkg.
For convenience, a simple script is provided to download the (huge) grid and filter it in order to keep only the portion covering Lombardy.
The script is scripts/download-census-eurostat.sh, needs wget and sqlite3 and is properly commented should one want to change its default filtering behavior.
Due to licensing, raw OMI data cannot be provided in the data folder.
In order to run the pipeline successfully, one needs to download OMI data from "Agenzia delle Entrate -> Area Riservata -> Fornitura dati OMI".
Downloaded data need manual intervetion in order to successfully be parsed by the pipeline, specifically the following example folder structure is expected:
data/omi
├── 2011 # subfolders corresponding to years
│ ├── quotazioni.csv # csv file with omi data
│ └── zone # folder containing kml files for omi zones
└── 2021
├── quotazioni.csv
└── zone
Raw income data are provided by "Ministero dell'Economia e delle Finanze" under CC-BY-3.0-IT license on their OpenData page.
As such, raw income data is already available in the data folder and doesn't need manual intervention.
Should one desire to add more data, here is the expected folder structure:
data/Redditi
├── 2011 # subfolders corresponding to years
│ ├── comunali.csv # municipal income data
│ └── subcomunali.csv # submunicipal income data
Interpolated datasets are available in the export folder.
Non-gpkg datasets contain a geom_id column, which can be matched with the corresponding geom_id column in the master_grid.gpkg file in order to associate the data to the geometry.