Skip to content

GeographicDataService/unified-uk-census-2021-22

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unified dataset of 2021/2022 UK Census variables for small areas

Python

Office for National Statistics National Records of Scotland Northern Ireland Statistics and Research Agency

This repository contains tools and scripts for harmonising and processing census data from the 2021 and 2022 UK Censuses. The data release is available here: GeoDS: Unified UK Census Data

The process is described in detail in the paper:

Abstract

The dataset is the first unified release covering all four UK nations at the smallest available geographic level: Output Areas in England, Wales, and Scotland, and Data Zones in Northern Ireland. The UK’s three census agencies—ONS (England & Wales), NRS (Scotland), and NISRA (Northern Ireland)—release their data separately, each with distinct variables, formats, and disclosure controls. Through a process of matching, standardisation, and aggregation, 190 comparable variables are produced. The dataset is made available as a series of topic tables indexed across all 239,023 of the UK’s small-area geographies. By providing a standardised dataset, this work enables seamless UK-wide analyses, facilitating cross-national comparisons and supporting research and public policy development.

Project Structure

└── 📁UK_Census_Data_21_22/
    └── 📁data
        └── 📁output_data_set #The produced dataset including unified Census tables for the UK and associated metadata 
        └── 📁individual_country_census_data # Downloaded Census tables for England & Wales, Scotland and Northern Ireland
        └── 📁uk_census_data # Unified Census tables for the United Kingdom 
        └── 📁uk_matching_output # Outputs from the manual matching process between countries
        └── 📁validation_plots  # Plots validating the matching for each variable
    └── 📁src
        └── reproduce_ukdataset_creation.py #script to fully reproduce the creation of the data set.
        └── download_census_data_1.py
        └── produce_uk_tables_2.py
        └── producevalidation_plots_3.py
        └── 📁census_download_scripts #scripts for downloading census data from each country
        └── 📁utils
    └── README.md
    └── requirements.txt

Installation

  1. Clone the repository:
    git clone git@github.com:ogoodwin505/UK_Census_Data_21_22.git
    cd UK_Census_Data_21_22
    
  2. Install dependencies using pip and a virtual enviroment
    python -m venv .venv
    source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`
    pip install -r requirements.txt
    

Reproduce Data Set

To reproduce the release dataset run:

python src/reproduce_ukdataset_creation.py

this will perform all stages of the process;

  1. Download the Census data from each country source,
  2. Unify that data into UK tables based on the variable matches in data/uk_matching_output/VariableMatchLookup.csv,
  3. Produce validation plots for each variable in the new dataset. The dataset is produced in unified_census_data_set.

UK Tables and Metadata

The final output of this code can be found in the unified_census_data_set directory. This is the released data product found at Figshare

📁unified_census_data_set
    └── 📁topic_tables
        └── 📁csv
        └── 📁parquet
    └── Table_Notes.csv
    └── Variable_Metadata.csv

There are 25 unified topic tables available in both csv and parquet format. Table_Notes.csv contains the list of table titles and notes on any features of interest in the harmonisation process. Variable_Metadata.csv contains the look up between the variable ids and the full variable descriptions.

Validation Plots

validation_plots contains boxplots and histograms for each variable in the unified dataset. These show the normalised (divided by the the table total) distrubutions of the variable seperated by country.

Acknowledgements

  • ONS: Office for National Statistics (England & Wales)
  • NRS: National Records of Scotland
  • NISRA: Northern Ireland Statistics and Research Agency

About

Reproducible pipeline for harmonising the 2021/2022 UK Censuses (ONS, NRS, NISRA) into 190 comparable variables for small-area geographies (OAs/DZs)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors