Skip to content

Conversation

@theoteske
Copy link
Contributor

Pull Request

Description

This pull request addresses and resolves issue #282.

The select_time_slice_nwp function would previously fail when an NWP data source contained init_time_utc coordinates that were not rounded to the hour (e.g., 12:30) due to the assumption in the time-selection logic that init times would align with hourly intervals. This PR makes the code more robust to external datasets that may not have these nice hourly intervals.

The PR simultaneously improves the performance of the time-selection logic by using the vectorized numpy.searchsorted function in place of the previous list comprehension to efficiently find the appropriate init times. This operation is now approximately 1.5x to 2.6x faster based on local benchmarking.

Additionally, we add more descriptive ValueError exceptions to provide clearer feedback to users when no valid NWP data can be found for a requested time range.

Fixes #282

How Has This Been Tested?

Comprehensive tests have been added to tests/select/test_select_time_slice.py to verify the update. These include a new test case that uses a mock NWP DataArray with non-hourly init times to confirm that the correct time slices are selected, as well as tests to ensure that the new ValueError exceptions are raised under the appropriate conditions.

The existing test suite also still passes.

  • Yes

  • Yes, a sanity check was performed by confirming that the init_time_utc and step coordinates of the output DataArray correctly align to produce the expected target times.

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@dfulu
Copy link
Member

dfulu commented Jan 21, 2026

Thanks for the work on this @theoteske, its looking really good.

If I understand from the original issue #282, the problem setup in that issue might have been more like:

def nwp_data_with_offset_steps():
    data = np.random.rand(2, 3, 2, 4, 5).astype(np.float32)
    init_time = pd.to_datetime(["2023-01-01 12:00", "2023-01-01 13:00"])
    step = pd.to_timedelta([0.5, 1.5, 2.5], unit="h")
    channel = ["t", "dswrf"]
    x = np.arange(4)
    y = np.arange(5)
    da = xr.DataArray(
        data,
        coords=[init_time, step, channel, x, y],
        dims=["init_time_utc", "step", "channel", "x_osgb", "y_osgb"],
    )
    return da

or perhaps it was

def nwp_data_with_offset_init_times():
   data = np.random.rand(2, 3, 2, 4, 5).astype(np.float32)
   init_time = pd.to_datetime(["2023-01-01 12:30", "2023-01-01 13:30"])
   step = pd.to_timedelta([0, 1, 2], unit="h")
   channel = ["t", "dswrf"]
   x = np.arange(4)
   y = np.arange(5)
   da = xr.DataArray(
       data,
       coords=[init_time, step, channel, x, y],
       dims=["init_time_utc", "step", "channel", "x_osgb", "y_osgb"],
   )
   return da

@zaryab-ali, could you help to clear this up?

@zaryab-ali
Copy link
Contributor

the second code snippet is more accurate i think because step was rounded down and there was a value missing at the end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

issue in select_time_slice_nwp

3 participants