This repository contains the data, code, and derived results used to quantify how data-center load growth affects wholesale electricity prices, transmission cost allocation, and downstream burden and distributional outcomes across the United States.
If you use this repository, please cite the accompanying manuscript (Currently Under Review in Nature Portfolio Journals) and include a link to this repository and the commit hash (or a tagged release).
This repository provides:
- ✅ Source code (analysis notebooks/scripts) under
00_code/ - ✅ A small dataset to demonstrate the code: the repository includes processed inputs/derived artifacts that allow a smoke-test run for figure generation; the two large regression panels are hosted on Google Drive (see below).
- Tested on: macOS and Linux (Ubuntu).
(If you are preparing a release for archival, we recommend recording exact versions here.)
- Python: 3.10+ recommended
- Jupyter: required to run notebooks in
00_code/
This project is built around standard scientific Python tooling. Typical dependencies include:
- numpy, pandas, scipy
- geopandas, shapely, pyproj (for geospatial joins/plots where applicable)
- statsmodels, linearmodels (econometrics/regressions)
- matplotlib, plotly (static + interactive figures)
- openpyxl (Excel I/O)
- tqdm (progress bars)
- jupyterlab / notebook
- No GPU required.
- A standard laptop/desktop is sufficient.
- Recommended: ≥16 GB RAM for the full pipeline; less is sufficient for the demo/smoke test.
Most inputs and all derived outputs needed to inspect results are stored in the repository:
- cleaned/processed artifacts in
results/ - static previews and interactive HTML figures in
figures/ - supporting source files in the numbered data folders (
02_fuel_mix/,04_load_and_costs/,04_rider/, etc.)
Due to size constraints, the two panel datasets used for the price-impact regressions are not stored directly in this GitHub repository. They are hosted on Google Drive:
- Google Drive folder: https://drive.google.com/drive/folders/1YjEEAJrFbZg64ZUPRTsWpSEI4e6IMdKe?usp=sharing
Recommended placement after download
- Download the files from the Drive folder.
- Keep filenames unchanged.
- Place them under:
01_tables/(ISO-level panel)01_tables_city/(city-level panel)
If you prefer a different location, set a path variable in your notebooks/scripts and document it in
00_code/.
- Download all files in the folders (including Google Drive).
- Run the notebooks/scripts in
00_code/in the intended order (many repositories use numeric filename prefixes; if so, run ascending).
Expected output: regenerated processed tables in results/ and updated figures in figures/.
Typical runtime: depends on hardware and regression settings (tens of minutes to a few hours).
- 00_code/ — Required code to reproduce the main calculations and figures.
- 01_tables/ — Panel dataset for estimating the impact of ISO-level data center capacity on wholesale prices.
- 01_tables_city/ — Panel dataset for estimating the impact of non-ISO cities’ data center capacity on wholesale prices.
- 02_fuel_mix/ — Raw generation mix (fuel mix) data for each ISO.
- 03_aggressive_path/ — Inputs used to construct the aggressive scenario for data-center capacity trajectories.
- 04_load_and_costs/ — Source files for transmission charges and long-term load forecasts, plus extracted/cleaned tables.
- 04_rider/ — PUC docket documents and extracted tables used to compute data-center transmission cost responsibility (“rider” allocation).
- 05_burden/ — Price decomposition, projections, and fuel-price forecast files used in burden calculations.
- 06_AI_accept/ — AI tool usage (adoption) data and related computed outputs.
- 07_employment/ — State- and county-level employment data and processing outputs.
- 08_tax/ — Raw and processed data related to tax incentives / abatements.
- figures/ — Static (SVG) and interactive (HTML) visualizations.
- results/ — Paper-facing outputs (processed tables, model outputs, summary artifacts).
Zones:
Grid Cost Allocations:
- Code in this repository is released under the MIT License (see
LICENSE). - Data in the repository are compiled from a mix of sources (public datasets, ISO/RTO market data products, regulatory filings, etc.). Some raw inputs may have their own terms-of-use. Users are responsible for complying with any upstream licensing/attribution requirements.
For questions or reproducibility issues, please open a GitHub issue in this repository.
