Vs30 refactor by AndrewRidden-Harper · Pull Request #42 · ucgmsim/Vs30

AndrewRidden-Harper · 2026-01-27T09:06:06Z

This PR is a major refactor of the Vs30 package to improve readability, modularity, and testability. The mathematical expressions are equivalent to the original version, but use clearer variable names and simplified forms. Regression tests confirm that the output from the refactored codebase is consistent with the original version.

Design changes

Centralized configuration

All parameters that influence the calculated Vs30 values are defined in a single config.yaml file, rather than being hardcoded across various source files as in the original codebase.
Parameters can optionally be overridden via command-line arguments.
Configuration is validated at load time using Pydantic.

Modular CLI

A Typer-based CLI (vs30) exposes the full pipeline as well as individual stages (update-categorical-vs30-models, make-initial-vs30-raster, spatial-fit, etc.), making it possible to run or re-run specific steps independently.
A new compute-at-locations command computes Vs30 at specific latitude/longitude points without generating full raster grids, which is efficient for querying a small number of sites.

Input Vs30 data

The original version had multiple modes of operation that selected which observed Vs30 dataset to use (e.g., "original" combined three hardcoded sources with dataset-specific filtering and downsampling; "cpt" loaded CPT data), with different processing and Bayesian update paths for each mode.
In the refactored version, data preparation and filtering is the user's responsibility. All observed Vs30 values that are passed in are used directly.
Observations are provided in one or both of two categories:
- Clustered observations (typically dense CPT-inferred Vs30 values) — processed with DBSCAN clustering so that spatially grouped measurements are not over-weighted in the Bayesian update.
- Independent observations — treated as individual measurements.
Both kinds of datasets can be used in the same run, with at least one being required.

Vs30 map generation

When spatially adjusting the Vs30 map by fitting multivariate normal (MVN) distributions to observed Vs30 values, the original package looped over all pixels in the raster (processing them in blocks), computing the distance from each pixel to all observations to determine which (if any) observations would affect it.
The refactored version reverses the approach: it first loops over observations, using bounding boxes to identify which map pixels will be affected, then loops only over those affected pixels for the more expensive MVN conditioning calculations.
When the number of observations is much smaller than the number of pixels in the map, this is substantially faster because the majority of pixels (those far from any observation) are never processed.
For clustered observations, the bounding box search uses a subsampled set of observations (by default, every 100th observation within each cluster). This is a good approximation because all observations within a spatial cluster affect nearly the same set of map pixels. Unclustered (isolated) observations are always included in full. The subsampling step is configurable and can be set to 1 to use all observations, at the cost of a slower bounding box search.
Separately, the max_points parameter caps the number of nearest observations used for the per-pixel MVN update, limiting the cost of the matrix inversion at each pixel.

Multiprocessing

Both the raster pipeline and compute-at-locations support multiprocessing. The spatial adjustment work is divided into chunks of affected pixels distributed across worker processes.
BLAS thread oversubscription is managed at runtime using threadpoolctl, rather than requiring environment variables to be set before import.

Test suite

A comprehensive pytest suite covers all modules: configuration, category assignment, Bayesian updates, raster creation, hybrid modifications, spatial adjustment, the CLI, parallelism, and full-pipeline regression tests.

src.sample() gives an array per point, even for a single band, so we still need s[0] to extract the scalar value

AndrewRidden-Harper · 2026-02-25T08:26:44Z

I've addressed all suggestions. The main changes are:

All orchestration/pipeline code has been moved into a new module pipeline.py
Now only three CLI commands: map (make a Vs30 map), points (Vs30 at provided points), and update-priors (Bayesian update of the assumed prior Vs30 for terrain and geology categories)
The GitHub Actions workflows are now in line with the current approach in our other repos (uses uv etc.)

Typer only supports auto-generated --long-form options OR explicit custom names, not both. Dropping short flags (-c, -v, -t, -i, -l, -o, -d) lets Typer derive intuitive --option-names from parameter names without cluttering the code with redundant explicit strings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously each worker process created its own tqdm bar inside compute_spatial_adjustment_at_points(), causing rapid flickering between competing progress displays. Now progress is controlled by the caller via an optional progress_bar parameter. Sequential path: one smooth bar covering geology + terrain. Parallel path: splits into many small chunks instead of n_proc large chunks, with a single parent bar updating as each chunk completes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…around

… grid updates Add Geology/Terrain prefix to all progress bars so users understand why stages run twice. Use many small chunks for grid spatial adjustment so the progress bar updates smoothly. Suppress progress bar for single-chunk bbox search in favor of a simple status message. Split points sequential path into separate geology and terrain progress bars. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AndrewRidden-Harper · 2026-02-27T08:11:46Z

Getting Vs30 values at Cesar's list of locations has proven to be a very useful stress test. Actually using the CLI tool revealed that the CLI flags were confusing and progress bars unhelpful, so I've improved those now. Unfortunately, (or perhaps actually fortunately), I also learned that when doing lists of points (rather than a grid), the coastal distance adjustment was not being properly applied, so that will probably take a few days to fix. I'll mark this PR as a draft until it's fixed.

AndrewRidden-Harper added 30 commits December 4, 2025 15:45

vectorized bounding box around measurements

c51a329

use rasterio coordinate transforms

4ec2c75

immediately exclude invalid map values

b090427

basic prototype

3b02d6f

bug fixes

00344e6

in progress

c9f9e64

add typer

f345744

working pieces but not all brought together

4788062

improved id raster creation

dd3a5d9

improvements for making initial vs30 rasters

a68f170

improved structure

4704724

small fixes

70bd683

basic cluster_update functionality

5176b5b

improved handling of clustered data

edb2a4a

initial vs30 raster compatible with new column names

97ca0a2

create the hybrid geology, coast dist, slope raster

f110d5e

renamed modules

f63c1c2

del old vs30 code

98e6afd

del more old vs30 code

482b663

reorganised

0f48dc4

renamed to spatial for clarity

21a98c1

cli command for spatial update (mvn)

3de4a50

pipeline v1 (untested)

0d45450

resolved some small differences between old and new

a1cb7fa

updates for consistency with old code

d6009bb

more descriptive filenames

598af30

improved output filename clarity

f490ce0

more descriptive file and variable names

64ab83b

add flag in config for doing the Bayesian update

55d3ec7

only take inputs from config.yaml for simplicity

d810376

AndrewRidden-Harper added 21 commits February 25, 2026 08:55

del unnecessary comments in ensure_shapefile_extracted()

9ba864a

use rasterio indexes for sampling instead of list compreshension

7afa264

use pytest approx

c4766fe

revert to list comprehension in src.sample to extract

4aa14ba

src.sample() gives an array per point, even for a single band, so we still need s[0] to extract the scalar value

del CLI plot-posterior-values

6f4d2dc

created pipeline.py and improved CLI

a846032

Align GitHub workflows to other repos by using uv and PyPI

6ed3ca0

update to python >= 3.11 for qcore-utils compatibility

17fd590

GDAL setup in GitHub workflows

a0fb7ea

Moer GDAL setup in GitHub workflows

19bca09

GDAL version fix

1b97e96

pin GDAL Python bindings to system libgdal version in CI workflows

40f3e63

GitHub Actions deptry test fix

6fe735d

del unnecessary matplotlib from dependencies

dc1ecbb

fix ty typecheck

efaf8c0

fix more ty type check issues

8a88461

update tests to call funcs in pipeline.py instead of using CLI

7c51dde

add numpydoc config to pyproject.toml

20c3e29

fix numpydoc linter issues and shorter module docstrings

c2c24d1

del git-extension.yml as no git in pyproject.toml

fe96998

unify chunk merging for any n_proc

254b77d

AndrewRidden-Harper requested review from claudio525 and lispandfound February 25, 2026 08:26

AndrewRidden-Harper and others added 4 commits February 27, 2026 12:20

Rename map CLI command to grid to avoid Python builtin shadowing work…

418ebbc

…around

AndrewRidden-Harper marked this pull request as draft February 27, 2026 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vs30 refactor#42

Vs30 refactor#42
AndrewRidden-Harper wants to merge 131 commits intomasterfrom
vs30_refactor

AndrewRidden-Harper commented Jan 27, 2026 •

edited

Loading

Uh oh!

AndrewRidden-Harper commented Feb 25, 2026 •

edited

Loading

Uh oh!

AndrewRidden-Harper commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AndrewRidden-Harper commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design changes

Centralized configuration

Modular CLI

Input Vs30 data

Vs30 map generation

Multiprocessing

Test suite

Uh oh!

AndrewRidden-Harper commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewRidden-Harper commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AndrewRidden-Harper commented Jan 27, 2026 •

edited

Loading

AndrewRidden-Harper commented Feb 25, 2026 •

edited

Loading