-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Overview
The Planner application will be updated to support 1D lat/lon coordinate arrays (regular rectilinear grid) in addition to the existing 2D Lambert projection coordinates (curvilinear grid). This change was requested by the GSFC team.
Problem
The original interpolate_dataset method in sos/src/planner/main.py failed when processing datasets with 1D lat/lon arrays:
# Original code
lons = dataset.lon.values.flatten() # Lambert: (3,675,000,) | Lat/Lon: (2971,)
lats = dataset.lat.values.flatten() # Lambert: (3,675,000,) | Lat/Lon: (1651,)
points = list(zip(lons, lats)) # FAILS for 1D case - different array lengthsError:
INFO:root:Combining the two datasets.
ERROR:root:Planner on_change processing failed after 65.72 seconds: different number of values and points
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/joblib/_utils.py", line 109, in __call__
return self.func(**kwargs)
File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 607, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 607, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/opt/src/planner/main.py", line 164, in interpolate_dataset
zi = griddata(
File "/usr/local/lib/python3.9/site-packages/scipy/interpolate/_ndgriddata.py", line 323, in griddata
ip = LinearNDInterpolator(points, values, fill_value=fill_value,
File "interpnd.pyx", line 326, in scipy.interpolate.interpnd.LinearNDInterpolator.__init__
File "interpnd.pyx", line 95, in scipy.interpolate.interpnd.NDInterpolatorBase.__init__
File "interpnd.pyx", line 104, in scipy.interpolate.interpnd.NDInterpolatorBase._set_values
File "interpnd.pyx", line 232, in scipy.interpolate.interpnd._check_init_shape
ValueError: different number of values and points
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/src/planner/main.py", line 1525, in _on_change_impl
combined_output_file, combined_dataset = self.generate_combined_dataset(
File "/opt/src/planner/main.py", line 223, in generate_combined_dataset
results = Parallel(
File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 2072, in __call__
return output if self.return_generator else list(output)
File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 1682, in _get_outputs
yield from self._retrieve()
File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 1784, in _retrieve
self._raise_error_fast()
File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 1859, in _raise_error_fast
error_job.get_result(self.timeout)
File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 758, in get_result
return self._return_or_raise()
File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 773, in _return_or_raise
raise self._result
ValueError: different number of values and pointsRoot cause: For 2D arrays (Lambert), flattening produces same-length arrays that can be zipped. For 1D arrays (lat/lon), lat and lon have different lengths representing separate axes, not coordinate pairs.
Data Format Comparison
| Aspect | Lambert Projection | Lat/Lon |
|---|---|---|
| lat shape | 2D (1750, 2100) | 1D (1651,) |
| lon shape | 2D (1750, 2100) | 1D (2971,) |
| Grid type | Curvilinear | Regular rectilinear |
| Resolution | ~1km irregular | ~0.01° (~1km) regular |
Implementation Options
Option 1: Backward Compatible (Both Formats)
Auto-detect coordinate dimensionality and handle each case appropriately:
lon_vals = dataset.lon.values
lat_vals = dataset.lat.values
if lon_vals.ndim == 1 and lat_vals.ndim == 1:
# 1D coordinates (regular lat/lon grid) - create meshgrid
lon_grid, lat_grid = np.meshgrid(lon_vals, lat_vals)
lons = lon_grid.flatten()
lats = lat_grid.flatten()
else:
# 2D coordinates (curvilinear grid like Lambert) - flatten directly
lons = lon_vals.flatten()
lats = lat_vals.flatten()
points = list(zip(lons, lats))Option 2: Lat/Lon Only
Assume all input data uses 1D lat/lon arrays:
lon_vals = dataset.lon.values
lat_vals = dataset.lat.values
lon_grid, lat_grid = np.meshgrid(lon_vals, lat_vals)
lons = lon_grid.flatten()
lats = lat_grid.flatten()
points = list(zip(lons, lats))Comparison
| Aspect | Backward Compatible | Lat/Lon Only |
|---|---|---|
| Lines of code | ~10 | ~5 |
| Lambert support | Yes | No |
| Lat/lon support | Yes | Yes |
| Runtime overhead | Negligible | None |
| Previous demo compatibility | Yes | No |
Decision
Backward compatibility was chosen for the following reasons:
-
Previous demo reproducibility: An earlier demonstration used Lambert projection data. Maintaining backward compatibility allows re-running or validating against those results.
-
Minimal overhead: The difference is approximately 5 lines of code with a single
if/elsebranch and no performance impact. -
Data source flexibility: If Lambert-projected data is needed in the future, no code changes are required.
-
Risk asymmetry: Removing backward compatibility and adding it later requires code changes and testing. Keeping it now has negligible cost.
Changes Made
File Modified
sos/src/planner/main.py
Method Updated
interpolate_dataset (lines 140-163)
Change Summary
-
Coordinate handling: Added dimensionality detection to determine if coordinates are 1D (lat/lon) or 2D (Lambert), then creates meshgrid for 1D case or flattens directly for 2D case.
-
Variable extraction: Added handling for 3D variable arrays (time, y, x) by extracting the first time step before flattening.
Code Diff
# Before
lons = dataset.lon.values.flatten()
lats = dataset.lat.values.flatten()
points = list(zip(lons, lats))
for var_name in variables_to_interpolate:
values = dataset[var_name].values.flatten()# After
lon_vals = dataset.lon.values
lat_vals = dataset.lat.values
if lon_vals.ndim == 1 and lat_vals.ndim == 1:
lon_grid, lat_grid = np.meshgrid(lon_vals, lat_vals)
lons = lon_grid.flatten()
lats = lat_grid.flatten()
else:
lons = lon_vals.flatten()
lats = lat_vals.flatten()
points = list(zip(lons, lats))
for var_name in variables_to_interpolate:
var_values = dataset[var_name].values
if var_values.ndim == 3: # (time, y, x)
values = var_values[0].flatten()
else:
values = var_values.flatten()Testing
To verify the changes work with both formats:
# Test with Lambert projection data
lambert_ds = xr.open_dataset('path/to/lambert/LIS_HIST_202501020000.d01.nc')
assert lambert_ds.lat.ndim == 2
assert lambert_ds.lon.ndim == 2
# Test with lat/lon data
latlon_ds = xr.open_dataset('path/to/latlon/LIS_HIST_201901020000.d01.nc')
assert latlon_ds.lat.ndim == 1
assert latlon_ds.lon.ndim == 1Run the planner with each data format and verify outputs are generated correctly.