-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Hi,
For my applications, I need sample precision when I extract subsets of the data at a given time using the get_waveforms method of an ASDFDataSet instance. However, I observed discrepancies of up to 1 sample between the start time I requested and the start time that was returned.
I tracked the issue down to _get_idx_and_size_estimate of asdf_data_set.py. The line:
offset = max(0, int((starttime - data_starttime) // dt ))produces an undesirable output when (starttime - data_starttime) / dt = X + 0.999999. Instead of returning offset = X + 1 samples, it returns offset = X samples due to the floating point number imprecision. This happens of course because of the behavior of int, which rounds to the nearest lower integer number.
A solution that works for me is to add a number a < 1 to (starttime - data_starttime) / dt before converting it to an integer. If this number is a = 0.5, the following line:
offset = max(0, int((starttime - data_starttime) / dt + 0.5))actually rounds the time to the nearest (lower or upper) integer number. See this commit: ebeauce@4437051
I guess that some people would want to be able to always round to the nearest upper or the nearest lower integer, so it may be worth adding a key-word argument to get_waveforms similarly to the slice method of Obspy Stream. My opinion is that rounding to the nearest sample would be the preferable default behavior.
Thank you,
Eric