This repository contains the R script and associated resources used to produce the final version of the UK Output Area Classification. The classification draws on Census 2021 (England & Wales) / 2022 (Scotland & Northern Ireland) data, as well as other supporting geographies, to segment the UK population into distinct clusters at the Output Area (OA) or equivelent level.
- Introduction
- Authors and Acknowledgments
- Project Structure
- Data Sources
- Prerequisites and Setup
- Workflow Overview
- Outputs
- How to Cite
- License
This repository provides the complete workflow for updating and finalising the UK Output Area Classification (UK OAC). This classification integrates socio-demographic variables derived from the latest available Censuses in England, Wales, Scotland, and Northern Ireland, along with spatial and other structural transformations.
This project updates the interim preview OAC 2021/22 which was generated using 2021 England and Wales census data, with 2021 data zone modelled data for Northern Ireland, and modelled OA level Scottish data. For England and Wales, the interim preview classification has been disseminated by the ONS; and the full UK preview classification by the CDRC.
The primary goal here was to produce a finalised UK-wide classification, ensuring that Scotland and Northern Ireland were fully integrated with consistent methodology and comparability to England and Wales.
Geographic Data Service:
- Alex Singleton, University of Liverpool
- Owen Goodwin, University of Liverpool
- Paul Longley, University College London
- Jakub Wyszomierski - although not involved in this update, developed the interim preview classification.
├── data
│ ├── census # Chunked or raw census files
│ ├── lookup # CSV lookups for variable codes, old vs. new
│ └── UKOAC # Files with existing/preview OAC assignments
├── extra_code
│ └── ... # Additional scripts to chunk large CSVs, etc.
├── map
│ └── ... # Spatial outputs (GPKG, etc.)
├── plot
│ └── comparison # Generated comparison plots (boxplots, alluvial diagrams)
└── README.md # This README
OAC_Input.parquet/OAC_Input_IHS.parquet: Final tabular input data before clustering (raw and transformed).UK_OAC_Final.parquet: Final assignment of Output Areas to OAC clusters.Comparison.png/Comparison_S_NI.png: Alluvial plots comparing old and new classifications.
-
ONS Census 2021 (England & Wales):
- ONS Data Service
- Downloaded from GitHub-based CSV files
-
NRS Census 2022 (Scotland):
- Scotland's Census
- Downloaded from GitHub-based CSV files
-
NISRA Census 2021 (Northern Ireland):
- NISRA Website
- Downloaded from GitHub-based CSV files
-
ONS Geoportal for boundaries and shapefiles, including clipped EW, Scotland, and Northern Ireland geographies.
Please see the script comments for specific file paths and download links.
- R (≥ 4.0)
- Key R packages:
tidyverse,sf,magrittr,janitor,scales,arrow,purrr,ggalluvial,h2o
Before running the script:
- Install the required packages (e.g.
install.packages("tidyverse")). - Download necessary GPkg, GeoJSON, or CSV files (some manual steps are required, as indicated in the script).
- Set your working directory to the root of this repository so that file paths resolve correctly.
OAC_Input.parquet/OAC_Input_IHS.parquet– Raw and transformed input variables for each OA.UK_OAC_Final.parquet– OA-level cluster assignments (Supergroup, Group, Subgroup).- Diagnostic Plots – Boxplots, alluvial charts, and distribution comparisons in
./plot/.
If you use or adapt this code or classification, please cite as follows:
Wyszomierski, J., Longley, P.A., Singleton, A.D., Gale, C. & O’Brien, O. (2024) A neighbourhood Output Area Classification from the 2021 and 2022 UK censuses. The Geographical Journal, 190, e12550. Available from: https://doi.org/10.1111/geoj.12550
Additionally, please adhere to relevant census data licenses and cite the data providers (ONS, NRS, NISRA) appropriately.