-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Selecting spatial subsets is a common task but a little cumbersome to implement particularly for links which include two sites. The idea is a function that takes a dataset or dataarray of sensor / gridded data and the coordinates of a bounding box, and returns a dataset or dataarray with only sensors / pixels in that box. Alternatively, a boolean dataarray could be returned that stores the information whether or not a sensor / pixel is within the bounding box.
Below is a prototype. This function addresses all kinds of data (point, link, grid), which needs to be specified via an argument. An alternative would be three (or two) separate functions. The prototype also distinguishes several modes of what it means for a link to be considered within the bounding box; this might not be necessary though. Any comment if and how such a function should be implemented is appreciated.
def select_spatial_subset(
ds, lonmin, lonmax, latmin, latmax, sensortype="point", include="full"
):
"""Select / cut a spatial subset defined by its bounding coordinates.
Parameters
----------
ds: xr.DataArray | xr.Dataset
Dataset or Dataarray with coordinates `lon` and `lat`
lonmin, lonmax, latmin, latmax: float
Minimum / Maximum longitude / latitude each defining one of the
four boundaries of the subset.
sensortype: str 'point, link, grid'
Wich kind of data.
include: str 'full', 'part', 'mid'
Defines mode of selection:
'full': both start and end points) within the domain;
'part' at least one pole within the domain;
has no effect if 'sensortype' is not 'link'
Returns
-------
xr.DataArray | xr.Dataset
Reduced dataset or dataarray
"""
# for links (CMLs / SMLs)
if sensortype == "link":
# consider midpoints
if include == "mid":
# calculate midpoints
lon_mid = (ds.site_0_lon + ds.site_1_lon) / 2
lat_mid = (ds.site_0_lat + ds.site_1_lat) / 2
# label sensors
cond_mid_lon = (lon_mid <= lonmax) & (lon_mid >= lonmin)
cond_mid_lat = (lat_mid <= latmax) & (lat_mid >= latmin)
cond = cond_mid_lon & cond_mid_lat
# consider start and end points
else:
# label sensors
cond_0_lon = (ds.site_0_lon <= lonmax) & (ds.site_0_lon >= lonmin)
cond_1_lon = (ds.site_1_lon <= lonmax) & (ds.site_1_lon >= lonmin)
cond_0_lat = (ds.site_0_lat <= latmax) & (ds.site_0_lat >= latmin)
cond_1_lat = (ds.site_1_lat <= latmax) & (ds.site_1_lat >= latmin)
# distinguish whether both start and end point
# or only one of them needs to be inside
if include == "full":
cond = cond_0_lon & cond_1_lon & cond_0_lat & cond_1_lat
elif include == "part":
cond = (cond_0_lon & cond_0_lat) | (cond_1_lon & cond_1_lat)
else:
raise ValueError()
# for point-like and gridded data
elif sensortype in ["point", "grid"]:
# label sensors / pixels
cond_lon = (ds.lon <= lonmax) & (ds.lon >= lonmin)
cond_lat = (ds.lat <= latmax) & (ds.lat >= latmin)
cond = cond_lon & cond_lat
else:
raise ValueError()
# apply selection
return ds.where(cond, drop=True)
# alternatively: return cond