Skip to content

Add testing workflow and dependencies#9

Merged
lispandfound merged 20 commits intomainfrom
ci-testing-branch
Jan 6, 2026
Merged

Add testing workflow and dependencies#9
lispandfound merged 20 commits intomainfrom
ci-testing-branch

Conversation

@lispandfound
Copy link
Contributor

Adds a testing workflow to the data repository and properly specifies dependencies. Also:

  1. Fixes the Dunedin smoothing boundary to make the tests pass. Probably should have a more elegant fix.
  2. Style fixes the registry file.
  3. Removes missing paths in the registry to "grisborne_basement.png".
  4. Deletes the registry entry to the broken Chow model.

The Tests

Summary of NZCVM Registry Validation Tests

This test suite ensures the integrity of the New Zealand Community Velocity Model (NZCVM) by validating both the metadata registry and the underlying geophysical data files.


Registry Metadata Validation

  • Schema Enforcement: Uses the schema library to verify nzcvm_registry.yaml structure, ensuring valid Unix paths, python identifiers for submodels, and correctly formatted URLs.
  • Dynamic Test Generation: Uses pytest_generate_tests to automatically create individual test cases for every entry defined in the registry.

Tomography Model Tests

Verifies the high-resolution 3D velocity models stored in HDF5 format:

  • File Integrity: Ensures paths exist and files are valid, non-empty HDF5 containers.
  • Metadata Alignment: Cross-references elevation levels in the YAML registry against the actual datasets inside the HDF5 file.
  • Structural Consistency: Validates that the dimensions of data arrays ($V_p$, $V_s$, and $\rho$) match the (Latitude $\times$ Longitude) grid dimensions.
  • Geospatial Logic: * Latitudes must be between $-90$ and $90$.
    • Longitudes must be between $0$ and $185$.
    • Coordinates must be strictly monotonic (ascending or descending).
  • Physical Constraints: Ensures no NaN values exist and data falls within predefined bounds:
    • $V_p$: $0$ to $11.0$ km/s
    • $V_s$: $0$ to $7.0$ km/s
    • $\rho$ (Density): $0$ to $5.0$ g/cm³

Basin & Surface Validation

Checks the geometry and depth surfaces of sedimentary basins:

  • Boundary Geometry: Uses shapely to verify that basin boundaries (GeoJSON) are valid, non-empty, and composed of closed Polygons.
  • Surface Coverage: Verifies that the HDF5 elevation surfaces spatially contain the entire basin boundary.
  • Smoothing Transitions: Ensures that "smoothing boundaries" are correctly nested within the primary basin boundaries.
  • Surface Data Quality: Validates that surface elevations are realistic (between $\pm10,000$m) and contain no missing data.

Vs30 & Submodel Tests

  • Vs30 Verification: Validates near-surface velocity data, ensuring values are within $0$ to $2000$ m/s.
  • 1D Velocity Models (vm1d): Parses 1D text-based model files (DEF HST format) to verify:
    • Correct header identification.
    • Positive layer thicknesses.
    • Valid seismic quality factors ($Q_p$, $Q_s$).

test: update QUALITY_BOUNDS values for vp and vs in test_registry

- Increase upper bound for vp from 10.0 to 11.0
- Increase upper bound for vs from 6.0 to 6.5
- Add note clarifying values are not physically derived

test: remove longitude spacing tolerance check from tomography geo gridpoints test

chore(pyproject): add empty types optional dep

tests: update quality bounds to ayushi's values

ci: add github actions

test: add comprehensive basin, vs30, and submodel validation tests

- Add Basin, Vs30, and Submodel TypedDicts for schema validation
- Parametrize tests for basin, vs30, and submodel entries from registry
- Add tests for basin boundaries, surfaces, smoothing, and containment
- Add tests for vs30 file existence, HDF5 validity, and gridpoint checks
- Add tests for submodel data existence and content validation
- Improve test coverage and robustness for registry-driven datasets

test: refactor registry tests for clarity and remove unused code

- Remove duplicate and unused imports in test_registry.py
- Decorate test_nzcvm_registry_schema with @no_type_check for type checking bypass
- Eliminate redundant and commented-out test code for basin smoothing boundaries
- Remove unused variables and streamline vs30 tests for readability

deps: add shapely dependency

ci: add all extras to ty check

ci: add yamllint config

ci: fix pytest action

ci: use cached LFS checkout

build: require Python 3.11 or higher in pyproject.toml

test: improve boundary geometry assertions in basin smoothing test

- Check now asserts the basin *boundary* contains the smoothing
boundary, instead of the basin *polygon* (which includes the interior).

ci: use big-data runner group for extra space

ci: use ubuntu latest but remove unused junk
…ntry

- Update Canterbury basement file paths to use correct lowercase naming
- Fix spacing in Wellington basement file path
- Remove invalid Gisborne basement image reference from registry
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @lispandfound, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of the NZCVM data repository by integrating a new testing framework and formalizing dependency management. The changes ensure that the geophysical data and its metadata registry adhere to defined standards and are free from common errors, improving overall data reliability and maintainability. It also includes several targeted fixes and stylistic improvements to the registry file itself.

Highlights

  • Testing Workflow: A comprehensive testing workflow has been added to the data repository to ensure the integrity and consistency of the New Zealand Community Velocity Model (NZCVM) data.
  • Dependency Management: Project dependencies are now properly specified using a new 'pyproject.toml' file, including dedicated optional dependencies for testing and development.
  • Registry Cleanup and Style Fixes: The 'nzcvm_registry.yaml' file has undergone significant cleanup, including reformatting, removal of outdated entries, correction of file paths, and general style improvements for readability.
  • Dunedin Smoothing Boundary Fix: The Dunedin smoothing boundary definition has been adjusted to resolve issues, likely to ensure tests pass as mentioned in the PR description.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (4)
    • .github/workflows/pytest.yml
    • .github/workflows/ruff.yml
    • .github/workflows/types.yml
    • .github/workflows/yamllint.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new pyproject.toml file to manage project dependencies and metadata, and a .yamllint configuration file to enforce a 200-character line length. The nzcvm_registry.yaml file has been updated by reformatting elev arrays for readability, removing two tomography entries (CHOW2020_EP2020_MIX, EP2025), standardizing basement file naming to lowercase 'basement' across multiple basin entries, removing an image from Gisborne_v21p7, and correcting various indentation and spacing issues throughout the basin and submodel sections. Additionally, a block of coordinate data was removed from regional/Dunedin/Dunedin_smoothing.txt. The review comments highlight several issues in tests/test_registry.py: the attrs library needs to be added as a test dependency in pyproject.toml, the wiki_images path validation logic was incorrect and needed to be relative to the repository root, and the test_submodel_data_is_valid test was improved to include rho validation and use pytest-subtests for better error reporting. Further suggestions included making latitude and longitude checks in test_surface_geo_gridpoints more robust by allowing both strictly ascending or descending order, and using line.strip().split() for parsing in read_smoothing_boundary and parse_submodel_data for improved robustness.

lispandfound and others added 10 commits January 5, 2026 13:02
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@lispandfound
Copy link
Contributor Author

@sungeunbae type checking does work but fails in the installation step because of version resolution issues with numba.

@lispandfound lispandfound merged commit e895787 into main Jan 6, 2026
4 checks passed
@lispandfound lispandfound deleted the ci-testing-branch branch January 6, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants