-
Notifications
You must be signed in to change notification settings - Fork 2
CF-Compliant Coordinate Metadata with Validation #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Implement set_coordinate_attributes() for spatial/vertical coordinates - Add comprehensive metadata for lat, lon, plev*, olevel, alevel, etc. - Externalize coordinate definitions to YAML for easy maintenance - Integrate into DefaultPipeline after variable attributes - Add configuration options (xarray_set_coordinate_attributes, xarray_set_coordinates_attribute) - Set 'coordinates' attribute on data variables - Add 24 unit tests (all passing) - Support both CMIP6 and CMIP7 coordinate conventions This enables proper CF compliance and xarray/cf-xarray compatibility for all pymorize outputs.
- Add xarray_validate_coordinate_attributes config option - Support 4 validation modes: ignore, warn, error, fix - Default mode 'warn' logs conflicts without breaking pipeline - 'fix' mode auto-corrects wrong metadata values - 'error' mode enforces strict CF compliance - Add 6 comprehensive validation tests (30/30 tests passing) - Enhanced logging for all validation scenarios - Fully backward compatible (default preserves existing behavior) This prevents silent data quality issues and gives users control over how to handle incorrect coordinate metadata in source data.
- Create doc/coordinate_attributes.rst with full feature documentation - Cover automatic attribute setting, validation modes, and configuration - Include usage examples for default pipeline, custom pipelines, and standalone - Document all supported coordinates (horizontal, vertical, scalar) - Explain validation modes (ignore, warn, error, fix) with examples - Add troubleshooting section and logging examples - Add to doc/index.rst table of contents - Documentation builds successfully with Sphinx
- Run isort with --profile black on coordinate_attributes.py and __init__.py - Run black formatter on coordinate_attributes.py and test file - All pre-commit checks now pass (isort, black, flake8, yamllint) - Tests still pass (30/30)
Implements Part 2 of dimension handling: mapping source dimension names to CMIP table requirements. Core Features: - Semantic dimension detection using multiple strategies: * Name pattern matching (regex for lat*, lon*, lev*, etc.) * Standard name attribute checking * Axis attribute checking * Value range analysis (detect lat/lon/pressure from values) - Automatic dimension name mapping (e.g., 'latitude' → 'lat', 'lev' → 'plev19') - Support for user-specified mappings - Dimension renaming to match CMIP requirements - Validation with configurable modes (ignore/warn/error) New Files: - src/pycmor/std_lib/dimension_mapping.py (550 lines) - tests/unit/test_dimension_mapping.py (31 tests, all passing) Configuration: - xarray_enable_dimension_mapping: Enable/disable (default: yes) - dimension_mapping_validation: Validation mode (default: warn) - dimension_mapping: User-specified mapping dict (optional) Integration: - Added to DefaultPipeline before set_coordinate_attributes - Exported in std_lib as map_dimensions function - Follows same pattern as other pipeline functions Tests: 31/31 passing - 10 tests: Dimension type detection - 7 tests: CMIP dimension mapping - 4 tests: Complete mapping creation - 3 tests: Applying mappings - 3 tests: Validation - 4 tests: Pipeline function wrapper
Added detailed documentation covering: - Overview and motivation - Four detection strategies (patterns, standard_name, axis, values) - CMIP dimension mapping for all coordinate types - Usage in default pipeline, custom pipelines, and standalone - Configuration options (enable/disable, validation modes, user mapping) - Five complete examples with before/after code - Integration with coordinate attributes - Detailed logging output - Troubleshooting guide - Performance and technical details Updated doc/index.rst to include dimension_mapping in table of contents.
Allows users to override CMIP table dimension names on a per-rule basis.
This addresses the need for custom dimension names in output files when
CMIP table dimensions are not appropriate for specific use cases.
New Features:
- allow_override parameter in create_mapping() and validate_mapping()
- dimension_mapping_allow_override configuration option (default: yes)
- Flexible mode: allows any output dimension names
- Strict mode: enforces CMIP table dimension names
- Per-rule dimension_mapping configuration support
Use Cases:
- Custom output formats with non-CMIP dimension names
- Legacy compatibility with existing tools/workflows
- Alternative standards (e.g., CF-only, not CMIP)
- Experimental variables with non-standard dimensions
Configuration:
# Global setting
dimension_mapping_allow_override: yes # or no
# Per-rule override
rules:
- model_variable: temp
cmor_variable: ta
dimension_mapping:
lev: my_custom_level # Override plev19 → my_custom_level
latitude: my_lat # Override lat → my_lat
longitude: my_lon # Override lon → my_lon
Validation:
- Flexible mode (allow_override=yes): warns on dimension count mismatch
- Strict mode (allow_override=no): errors on non-CMIP dimension names
This maintains backward compatibility (default is flexible mode) while
providing strict validation when needed for CMIP submission.
Added comprehensive test coverage for the allow_override functionality: - 6 new tests covering flexible and strict modes - Test for user override with custom dimension names - Test for strict mode validation rejection - Test for partial override (mixed custom/CMIP names) - Test for pipeline function integration - Fixed existing test to use strict mode Updated documentation with: - Allow Override Mode configuration section - Example 6: Overriding CMIP dimension names - Example 7: Per-rule override configuration - Use cases and best practices Test Results: 37/37 passing (31 original + 6 new) All tests pass, feature is fully documented and ready for use.
Fixed formatting issues detected by black formatter. All 37 tests still passing.
|
I might forget to write this down otherwise: it would be great if this could also have the accessor pattern we have for certain other parts of manipulating the xarray object. |
pgierz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm about halfway done, more comments to follow.
- Apply black formatting to coordinate_attributes.py and dimension_mapping.py - All linting checks (flake8, isort, black) now pass
Co-authored-by: Paul Gierz <pgierz@awi.de>
Co-authored-by: Paul Gierz <pgierz@awi.de>
Co-authored-by: Paul Gierz <pgierz@awi.de>
Co-authored-by: Paul Gierz <pgierz@awi.de>
Co-authored-by: Paul Gierz <pgierz@awi.de>
Co-authored-by: Paul Gierz <pgierz@awi.de>
siligam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea and definitely worthwhile long-term. Right now this YAML just drives the new coordinate metadata lookup and we rely on the explicit keys/structure in src/pycmor/data/coordinate_metadata.py plus the existing validation tests to keep it consistent for the prep-release branch. I’d rather keep this PR focused on delivering the metadata and dimension-mapping plumbing, but I can follow up with a small helper that validates the YAML (e.g., via a lightweight schema using dataclasses/pydantic or voluptuous/jsonschema) in a subsequent cleanup. That follow-up could then load every entry through the schema so the metadata file automatically gets sanity-checked before we ship the next tagged release.
siligam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments on coordinate_attributes.py are address.
|
From internal developer chat:
|
…cumentation examples
Pull Request: Complete Dimension Handling - Coordinate Metadata & Dimension Mapping
Overview
This PR implements complete dimension handling for pycmor, addressing both the "output side" (CF-compliant coordinate metadata) and the "input side" (dimension mapping from source to CMIP standards). It includes automatic semantic dimension detection, flexible dimension name mapping with per-rule override capability, YAML-based metadata definitions, configurable validation, comprehensive testing (67 tests), and full documentation.
Problem Solved
Before
Coordinate Metadata (Output Side):
Dimension Mapping (Input Side):
After
Part 1: Coordinate Attributes
Part 2: Dimension Mapping
Combined:
Key Features
Part 1: Coordinate Attributes (Output Side)
1. Automatic Coordinate Metadata Setting
standard_name,axis,units,positiveon coordinate variablescoordinatesattribute on data variables2. YAML-Based Metadata Definitions
src/pycmor/data/coordinate_metadata.yamldimensionless_mappings.yaml)3. Metadata Validation
Part 2: Dimension Mapping (Input Side)
4. Semantic Dimension Detection
5. Intelligent CMIP Mapping
6. Per-Rule Dimension Override
7. Comprehensive Testing
8. Full Documentation
Supported Coordinates
Horizontal
Vertical - Pressure Levels
Vertical - Ocean Levels
Vertical - Atmosphere Model Levels
Vertical - Altitude/Height/Depth
Scalar Coordinates
Other
Changes
New Files
Part 1: Coordinate Attributes
src/pycmor/data/coordinate_metadata.yaml(420 lines) - Coordinate metadata definitionssrc/pycmor/std_lib/coordinate_attributes.py(298 lines) - Core implementationtests/unit/test_coordinate_attributes.py(612 lines) - 30 comprehensive testsdoc/coordinate_attributes.rst(526 lines) - Full documentationPart 2: Dimension Mapping
src/pycmor/std_lib/dimension_mapping.py(618 lines) - Core implementationtests/unit/test_dimension_mapping.py(771 lines) - 37 comprehensive testsdoc/dimension_mapping.rst(890 lines) - Full documentationModified Files
src/pycmor/core/config.py- Added 6 configuration options (3 for each part)src/pycmor/core/pipeline.py- Integrated both features into DefaultPipelinesrc/pycmor/std_lib/__init__.py- Added wrapper function exportsdoc/index.rst- Added both documentation files to table of contentsConfiguration Options Added
Coordinate Attributes:
Dimension Mapping:
Usage Examples
Automatic (Default Pipeline)
Custom Pipeline
Standalone
Validation Examples
Development Mode (Default)
Logs warnings for conflicts, doesn't break pipeline.
Production with Trusted Data
No validation overhead, preserves all source metadata.
Strict Validation
Fails fast on bad data, good for CI/CD.
Auto-correction
Automatically corrects known metadata issues.
Test Results
Part 1: Coordinate Attributes
$ conda run -n pycmor-dev python -m pytest tests/unit/test_coordinate_attributes.py -v ======================== 30 passed, 4 warnings in 0.70s ========================Test Coverage:
Part 2: Dimension Mapping
$ conda run -n pycmor-dev python -m pytest tests/unit/test_dimension_mapping.py -v ======================== 37 passed, 4 warnings in 0.58s ========================Test Coverage:
Combined:
======================== 67 passed, 8 warnings in 1.28s ========================Example Output
Before
After
Benefits
For Users
For Developers
For CMIP Compliance
Performance
Backward Compatibility
✅ Fully backward compatible
warn(non-breaking)Documentation
Full RST documentation added to Sphinx docs:
Documentation builds successfully:
$ conda run -n pycmor-dev make html # Build successful (some pre-existing warnings)Future Work
This PR provides complete dimension handling (both input and output sides). Possible future enhancements:
Commits
Part 1: Coordinate Attributes
Part 2: Dimension Mapping
4. 3476503 - feat: Add dimension mapping from source data to CMIP standards
5. f1c96e9 - docs: Add comprehensive RST documentation for dimension mapping
6. 7edb261 - feat: Add per-rule dimension override capability
7. 8731ddd - feat: Add tests and documentation for dimension override feature
Checklist
Related Issues
This PR provides complete dimension handling for pymorize:
Together, these features ensure proper CF compliance, xarray compatibility, and flexible dimension naming for all CMIP6/CMIP7 outputs.
Screenshots/Examples
N/A (backend feature, no UI changes)
Additional Notes
files.py(existing behavior preserved)lat_bnds,plev_bnds) are automatically skippedsrc/pycmor/data/coordinate_metadata.yamlReady for review! 🚀