GitHub - GeospatialCentroid/Guam_SVI_Calculator: Python script for querying the US Census API, and calculating Hazard Susceptibility (HSI) Indices for Guam census tracts.

Guam HSI (Hazard Susceptibility Index) Calculator

The Hazard Susceptibility Viewer (HSV) was created for the Pacific Region to document and display areas within islands and territories that, by the nature of their demographic makeup, infrastructure sitings, or topographical features, may require extra resources or earlier messaging in the event of hazardous weather conditions. The Center for Disease Control’s Social Vulnerability Index (CDC SVI) for the continental United States, upon which the HSV is based, does not extend to many of the islands and territories in the Pacific Region, motivating the creation of this program. A number of the variables used in the HSV have been expressly selected to be reflective of the demographic makeup of our first test case: the island of Guam.

Expanding on the work of Paulino et. al. (2021), the HSV incorporates spatial layers representing critical infrastructure locations and topographic features in addition to demographic and socioeconomic layers.

1 Project Purpose

The calculator pulls Decennial Census variables (2020 by default) and computes alias fields to mirror the CDC/ATSDR Social‑Vulnerability Index (SVI).
Because every transformation is driven by a CSV, you can swap‑in a different variable list (e.g. Guam, US States) without touching the Python code.

2 Directory layout

project‑root/
│
├── configs/
│   └── variables.csv     ← One row per alias.  THIS drives all calculations.
│
├── src/
│   ├── fetch.py          ← Generic Census‑API downloader.
│   ├── compute_hsi.py    ← Adds Aliases and computes values based on calculations.
│   └── main.py           ← Command‑line driver & offline‑cache manager.
│
├── cache/                ← Auto‑created.  Holds raw CSV snapshots per dataset.
└── hsi_output.csv        ← Example final output (path is user‑selectable).

3 Quick‑start (30‑second demo)

python -m venv venv         # create isolated environment (optional)
source venv/bin/activate    # on Windows:source venv\Scripts\Activate
pip install -r requirements.txt   # pandas, requests, numpy only

python -m src.main --state 66 --year 2020 --geography place --outfile output/hsi_output.csv

python src/join_csv_to_shapefile.py cache/tl_2020_66_place.zip output/hsi_output.csv PLACEFP place --output output/hsi_output.shp

First run downloads data from the Census API and writes a copy under cache/…csv. If the API is unreachable on later runs the program it re‑uses the cached copy automatically.

4 Configuration – `variables.csv`

Column	Purpose	Example
alias	Short, human‑friendly name created in the output	`EP_POV150`
dataset	Census product slug (matches API URL segment)	`dpgu` (2020 Guam Data Profile)
variable	‑ A raw Census code or ‑ Any arithmetic expression referencing raw codes	`(E_POV150 / S1701_C01_001E) * 100`

No limits: use + – * / ( ), and ** for powers (e.g **0.5 for square root)

To reference a previously declared variables use: df['{previously declared variable}'] and replace '{previously declared variable}' with the previously declared variable of your choosing.

To calculate ranks use: df['{previously declared variable}'].rank(pct=True).round(4).

5 Program workflow in depth

5.1 `main.py` – orchestration & offline cache

Step	Code fragment	What happens & why
1	`_parse_args()`	Reads CLI flags; every flag has a sensible default so beginners can run the script “as‑is”.
2	`fetch.group_variable_codes_by_dataset()`	Scans variables.csv to build `{dataset → [raw codes…]}`. This works for any CSV – no hard‑coding.
3	Loop over datasets	For each slug: • try live download via `fetch.download_data()` • on failure fall back to `cache/` if a snapshot exists. • verify every requested code is present (`_assert_all_vars_present`).
4	Merge frames	A left‑join on the geography keys (`state`, `place`, …) produces one wide frame `df_raw`.
5	`hsi()`	Delegates to compute_hsi.py to add Alias.
6	Column reorder	Geography keys first → tidy output.
7	`to_csv(args.outfile)`	Final flat‑file output for ArcGIS/QGIS.

5.2 `fetch.py` – Census API downloader

Regex discovery (VAR_RE)
Strict pattern ensures only valid Census tokens are captured from free‑form expressions.
Bucket by dataset (group_variable_codes_by_dataset)
Ensures that each API call hits exactly one product root (…/dec/dpgu vs …/dec/sdgu, etc.).
Chunking (CHUNK_SIZE = 50)
Census limits “get=…” to 50 variables. The code slices long lists and merges partial DataFrames.
Geography helper (geokeys_for)
Maintains the composite primary key (state + county + tract, etc.). Future geographies can be added in one lookup table.
Data cleaning
Numeric coercion converts "1234" → 1234.0; sentinel ‑888888888/‑999999999 are mapped to NaN so downstream calculations are safe.

5.3 `compute_hsi.py` – from raw numbers to percentile ranks

Phase	Function	Detail
1	`_load_alias_map`	Reads variables.csv into `{alias → expression}`.
2	`_evaluate_aliases`	• If expression is a single token → fast copy. • Else, wraps every token as `df['TOKEN']` and calls `pandas.eval(engine='python')` – arithmetic only, no arbitrary code execution. • Errors (divide‑by‑zero, missing column) yield `NaN` so the pipeline never aborts mid‑run.
3	`hsi()`	Public entry point used by `main.py`. Returns a new DataFrame (original untouched).

6 Extending / Modifying the Calculator

Task	How‑to
Add a new variable	Append a row to variables.csv with the correct `dataset` slug and either the raw code or an expression.
Switch to a different territory or year	CLI flags: `--state`, `--year`, `--geography`.
Support new geographies	Add an entry in `fetch.geokeys_for()` and pass the keyword on the CLI.
Increase API rate limits	Supply `--api-key <your‑Census‑key>` (free registration).

7 Error handling & offline resilience

Network / API outage – Any HTTP error triggers a fallback to the cached CSV so workflows continue uninterrupted.
Missing variables – The script halts with a clear “missing X variables” message, pointing you to gaps in variables.csv.
Expression errors – Problematic expressions resolve to NaN; the rest of the pipeline (percentiles, sums) still executes.
Calculation/Download errors – Be sure to carefully review the downloaded Census data, as some values may not be correct.

8 Mapping the data

To map the data, a places shapefile is needed. A places shapefile for the state you are working with can be downloaded from https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2020&layergroup=Places This file has been downloaded for Guam and can be found in 'cache/tl_2020_66_place.zip'

With a places shape file you can call the following script to join the computed HSI values with the shapefile. This script will also remove any columns that start with "DP" that were downloaded earlier using the API.

python src/join_csv_to_shapefile.py cache/tl_2020_66_place.zip output/hsi_output.csv PLACEFP place --output output/hsi_output.shp

Replacing the following parameters as appropriate:

cache/tl_2020_66_place.zip: The path to the zipped or unzipped shapefile
cache/2020_66_place_dpgu.csv: The path to the generated CSV file
PLACEFP: The column name to be joined from the shapefile
place: The column name to be joined from the CSV file
--output cache/joined_places.shp: The output file to be created
--remove_data (optional): Removes any columns that start with letters the "DP" but can be changed to any starting letters you'd like to exclude from the output.

10 Appendix A – Key regular expressions

VAR_RE r"\b[A-Z]{1,4}\d{0,3}_[0-9]{4}[A-Z]?\b"
Captures only well‑formed Census codes, avoiding false matches like DP1_0001C_extra.
TOKEN_RE r"[A-Za-z0-9_]+" changed to \b[A-Z]{1,4}\d{0,3}_[0-9]{4}[A-Z]?\b Finds tokens inside an alias expression so they can be wrapped as df["TOKEN"] for safe evaluation.

11 Appendix B – Sentinel values

Value	Meaning	Action taken
‑888 888 888	“The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area.”	Replaced with `NaN`
‑999 999 999	“The estimate or margin of error is not applicable or not available for the requested variable.”	Replaced with `NaN`

12 Appendix C – CLI reference (all flags)

Flag	Default	Description
`--state`	`66`	FIPS code (two digits) – 66 = Guam
`--year`	`2020`	Decennial Census year
`--geography`	`place`	API keyword: `state`, `county`, `tract`, `place`, …
`--config`	`configs/variables.csv`	Path to the alias/variable mapping CSV
`--outfile`	`hsi_output.csv`	Destination CSV
`--cache-dir`	`cache`	Directory for raw dataset snapshots
`--api-key`	None	Optional Census key for higher rate limits

Credits and Acknowledgments

Anthony Berardi, Colorado State University

Kevin Worthington, Colorado State University

Jarrod Loerzel, NOAA

Liz Batty, NOAA

References

OpenAI. (2025). ChatGPT (May 7th version) [Large language model]. https://chat.openai.com/chat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1 Project Purpose

2 Directory layout

3 Quick‑start (30‑second demo)

4 Configuration – `variables.csv`

5 Program workflow in depth

5.1 `main.py` – orchestration & offline cache

5.2 `fetch.py` – Census API downloader

5.3 `compute_hsi.py` – from raw numbers to percentile ranks

6 Extending / Modifying the Calculator

7 Error handling & offline resilience

8 Mapping the data

10 Appendix A – Key regular expressions

11 Appendix B – Sentinel values

12 Appendix C – CLI reference (all flags)

Credits and Acknowledgments

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
cache		cache
configs		configs
output		output
src		src
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

1 Project Purpose

2 Directory layout

3 Quick‑start (30‑second demo)

4 Configuration – variables.csv

5 Program workflow in depth

5.1 main.py – orchestration & offline cache

5.2 fetch.py – Census API downloader

5.3 compute_hsi.py – from raw numbers to percentile ranks

6 Extending / Modifying the Calculator

7 Error handling & offline resilience

8 Mapping the data

10 Appendix A – Key regular expressions

11 Appendix B – Sentinel values

12 Appendix C – CLI reference (all flags)

Credits and Acknowledgments

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

4 Configuration – `variables.csv`

5.1 `main.py` – orchestration & offline cache

5.2 `fetch.py` – Census API downloader

5.3 `compute_hsi.py` – from raw numbers to percentile ranks

Packages