Skip to content

This repository contains the data, code, and derived results used to quantify how data-center load growth affects wholesale electricity prices, transmission cost allocation, and downstream burden and distributional outcomes across the United States.

License

Notifications You must be signed in to change notification settings

PEESEgroup/Data-Center-Equality

Repository files navigation

Data Center–Driven Energy Burden and Inequality

This repository contains the data, code, and derived results used to quantify how data-center load growth affects wholesale electricity prices, transmission cost allocation, and downstream burden and distributional outcomes across the United States.


How to cite

If you use this repository, please cite the accompanying manuscript (Currently Under Review in Nature Portfolio Journals) and include a link to this repository and the commit hash (or a tagged release).


Required content (software supplement)

This repository provides:

  • Source code (analysis notebooks/scripts) under 00_code/
  • A small dataset to demonstrate the code: the repository includes processed inputs/derived artifacts that allow a smoke-test run for figure generation; the two large regression panels are hosted on Google Drive (see below).

System requirements

Operating systems

  • Tested on: macOS and Linux (Ubuntu).
    (If you are preparing a release for archival, we recommend recording exact versions here.)

Software

  • Python: 3.10+ recommended
  • Jupyter: required to run notebooks in 00_code/

Python dependencies (minimum set)

This project is built around standard scientific Python tooling. Typical dependencies include:

  • numpy, pandas, scipy
  • geopandas, shapely, pyproj (for geospatial joins/plots where applicable)
  • statsmodels, linearmodels (econometrics/regressions)
  • matplotlib, plotly (static + interactive figures)
  • openpyxl (Excel I/O)
  • tqdm (progress bars)
  • jupyterlab / notebook

Non-standard hardware

  • No GPU required.
  • A standard laptop/desktop is sufficient.
  • Recommended: ≥16 GB RAM for the full pipeline; less is sufficient for the demo/smoke test.

Data

Included in this repository

Most inputs and all derived outputs needed to inspect results are stored in the repository:

  • cleaned/processed artifacts in results/
  • static previews and interactive HTML figures in figures/
  • supporting source files in the numbered data folders (02_fuel_mix/, 04_load_and_costs/, 04_rider/, etc.)

Large datasets (hosted on Google Drive)

Due to size constraints, the two panel datasets used for the price-impact regressions are not stored directly in this GitHub repository. They are hosted on Google Drive:

Recommended placement after download

  1. Download the files from the Drive folder.
  2. Keep filenames unchanged.
  3. Place them under:
    • 01_tables/ (ISO-level panel)
    • 01_tables_city/ (city-level panel)

If you prefer a different location, set a path variable in your notebooks/scripts and document it in 00_code/.


Running the code

Full reproduction (tables + figures)

  1. Download all files in the folders (including Google Drive).
  2. Run the notebooks/scripts in 00_code/ in the intended order (many repositories use numeric filename prefixes; if so, run ascending).

Expected output: regenerated processed tables in results/ and updated figures in figures/.
Typical runtime: depends on hardware and regression settings (tens of minutes to a few hours).


Repository structure

  • 00_code/ — Required code to reproduce the main calculations and figures.
  • 01_tables/ — Panel dataset for estimating the impact of ISO-level data center capacity on wholesale prices.
  • 01_tables_city/ — Panel dataset for estimating the impact of non-ISO cities’ data center capacity on wholesale prices.
  • 02_fuel_mix/ — Raw generation mix (fuel mix) data for each ISO.
  • 03_aggressive_path/ — Inputs used to construct the aggressive scenario for data-center capacity trajectories.
  • 04_load_and_costs/ — Source files for transmission charges and long-term load forecasts, plus extracted/cleaned tables.
  • 04_rider/ — PUC docket documents and extracted tables used to compute data-center transmission cost responsibility (“rider” allocation).
  • 05_burden/ — Price decomposition, projections, and fuel-price forecast files used in burden calculations.
  • 06_AI_accept/ — AI tool usage (adoption) data and related computed outputs.
  • 07_employment/ — State- and county-level employment data and processing outputs.
  • 08_tax/ — Raw and processed data related to tax incentives / abatements.
  • figures/ — Static (SVG) and interactive (HTML) visualizations.
  • results/ — Paper-facing outputs (processed tables, model outputs, summary artifacts).

Figures (static previews → click for interactive HTML)

Zones:

Grid Cost Allocations:

PJM 2025 (click for interactive) MISO 2025 (click for interactive)

ERCOT 2025 (click for interactive) CAISO 2025 (click for interactive)


Licensing and third-party data

  • Code in this repository is released under the MIT License (see LICENSE).
  • Data in the repository are compiled from a mix of sources (public datasets, ISO/RTO market data products, regulatory filings, etc.). Some raw inputs may have their own terms-of-use. Users are responsible for complying with any upstream licensing/attribution requirements.

Contact

For questions or reproducibility issues, please open a GitHub issue in this repository.

About

This repository contains the data, code, and derived results used to quantify how data-center load growth affects wholesale electricity prices, transmission cost allocation, and downstream burden and distributional outcomes across the United States.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published