Skip to content

Change From Lambert Projection to Latitude Longitude In LIS Input File #102

@emmanuelgonz

Description

@emmanuelgonz

Overview

The Planner application will be updated to support 1D lat/lon coordinate arrays (regular rectilinear grid) in addition to the existing 2D Lambert projection coordinates (curvilinear grid). This change was requested by the GSFC team.

Problem

The original interpolate_dataset method in sos/src/planner/main.py failed when processing datasets with 1D lat/lon arrays:

# Original code
lons = dataset.lon.values.flatten()  # Lambert: (3,675,000,) | Lat/Lon: (2971,)
lats = dataset.lat.values.flatten()  # Lambert: (3,675,000,) | Lat/Lon: (1651,)
points = list(zip(lons, lats))       # FAILS for 1D case - different array lengths

Error:

INFO:root:Combining the two datasets.

ERROR:root:Planner on_change processing failed after 65.72 seconds: different number of values and points

joblib.externals.loky.process_executor._RemoteTraceback:

"""

Traceback (most recent call last):

File "/usr/local/lib/python3.9/site-packages/joblib/_utils.py", line 109, in __call__

return self.func(**kwargs)

File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 607, in __call__

return [func(*args, **kwargs) for func, args, kwargs in self.items]

File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 607, in <listcomp>

return [func(*args, **kwargs) for func, args, kwargs in self.items]

File "/opt/src/planner/main.py", line 164, in interpolate_dataset

zi = griddata(

File "/usr/local/lib/python3.9/site-packages/scipy/interpolate/_ndgriddata.py", line 323, in griddata

ip = LinearNDInterpolator(points, values, fill_value=fill_value,

File "interpnd.pyx", line 326, in scipy.interpolate.interpnd.LinearNDInterpolator.__init__

File "interpnd.pyx", line 95, in scipy.interpolate.interpnd.NDInterpolatorBase.__init__

File "interpnd.pyx", line 104, in scipy.interpolate.interpnd.NDInterpolatorBase._set_values

File "interpnd.pyx", line 232, in scipy.interpolate.interpnd._check_init_shape

ValueError: different number of values and points

"""


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

File "/opt/src/planner/main.py", line 1525, in _on_change_impl

combined_output_file, combined_dataset = self.generate_combined_dataset(

File "/opt/src/planner/main.py", line 223, in generate_combined_dataset

results = Parallel(

File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 2072, in __call__

return output if self.return_generator else list(output)

File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 1682, in _get_outputs

yield from self._retrieve()

File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 1784, in _retrieve

self._raise_error_fast()

File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 1859, in _raise_error_fast

error_job.get_result(self.timeout)

File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 758, in get_result

return self._return_or_raise()

File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 773, in _return_or_raise

raise self._result

ValueError: different number of values and points

Root cause: For 2D arrays (Lambert), flattening produces same-length arrays that can be zipped. For 1D arrays (lat/lon), lat and lon have different lengths representing separate axes, not coordinate pairs.

Data Format Comparison

Aspect Lambert Projection Lat/Lon
lat shape 2D (1750, 2100) 1D (1651,)
lon shape 2D (1750, 2100) 1D (2971,)
Grid type Curvilinear Regular rectilinear
Resolution ~1km irregular ~0.01° (~1km) regular

Implementation Options

Option 1: Backward Compatible (Both Formats)

Auto-detect coordinate dimensionality and handle each case appropriately:

lon_vals = dataset.lon.values
lat_vals = dataset.lat.values

if lon_vals.ndim == 1 and lat_vals.ndim == 1:
    # 1D coordinates (regular lat/lon grid) - create meshgrid
    lon_grid, lat_grid = np.meshgrid(lon_vals, lat_vals)
    lons = lon_grid.flatten()
    lats = lat_grid.flatten()
else:
    # 2D coordinates (curvilinear grid like Lambert) - flatten directly
    lons = lon_vals.flatten()
    lats = lat_vals.flatten()

points = list(zip(lons, lats))

Option 2: Lat/Lon Only

Assume all input data uses 1D lat/lon arrays:

lon_vals = dataset.lon.values
lat_vals = dataset.lat.values

lon_grid, lat_grid = np.meshgrid(lon_vals, lat_vals)
lons = lon_grid.flatten()
lats = lat_grid.flatten()

points = list(zip(lons, lats))

Comparison

Aspect Backward Compatible Lat/Lon Only
Lines of code ~10 ~5
Lambert support Yes No
Lat/lon support Yes Yes
Runtime overhead Negligible None
Previous demo compatibility Yes No

Decision

Backward compatibility was chosen for the following reasons:

  1. Previous demo reproducibility: An earlier demonstration used Lambert projection data. Maintaining backward compatibility allows re-running or validating against those results.

  2. Minimal overhead: The difference is approximately 5 lines of code with a single if/else branch and no performance impact.

  3. Data source flexibility: If Lambert-projected data is needed in the future, no code changes are required.

  4. Risk asymmetry: Removing backward compatibility and adding it later requires code changes and testing. Keeping it now has negligible cost.

Changes Made

File Modified

sos/src/planner/main.py

Method Updated

interpolate_dataset (lines 140-163)

Change Summary

  1. Coordinate handling: Added dimensionality detection to determine if coordinates are 1D (lat/lon) or 2D (Lambert), then creates meshgrid for 1D case or flattens directly for 2D case.

  2. Variable extraction: Added handling for 3D variable arrays (time, y, x) by extracting the first time step before flattening.

Code Diff

# Before
lons = dataset.lon.values.flatten()
lats = dataset.lat.values.flatten()
points = list(zip(lons, lats))

for var_name in variables_to_interpolate:
    values = dataset[var_name].values.flatten()
# After
lon_vals = dataset.lon.values
lat_vals = dataset.lat.values

if lon_vals.ndim == 1 and lat_vals.ndim == 1:
    lon_grid, lat_grid = np.meshgrid(lon_vals, lat_vals)
    lons = lon_grid.flatten()
    lats = lat_grid.flatten()
else:
    lons = lon_vals.flatten()
    lats = lat_vals.flatten()

points = list(zip(lons, lats))

for var_name in variables_to_interpolate:
    var_values = dataset[var_name].values
    if var_values.ndim == 3:  # (time, y, x)
        values = var_values[0].flatten()
    else:
        values = var_values.flatten()

Testing

To verify the changes work with both formats:

# Test with Lambert projection data
lambert_ds = xr.open_dataset('path/to/lambert/LIS_HIST_202501020000.d01.nc')
assert lambert_ds.lat.ndim == 2
assert lambert_ds.lon.ndim == 2

# Test with lat/lon data
latlon_ds = xr.open_dataset('path/to/latlon/LIS_HIST_201901020000.d01.nc')
assert latlon_ds.lat.ndim == 1
assert latlon_ds.lon.ndim == 1

Run the planner with each data format and verify outputs are generated correctly.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions