-
Notifications
You must be signed in to change notification settings - Fork 0
Description
At the University of Canterbury, during the development of the NZGMDB, we have been working with waveform data using the FDSN("GEONET") client in Python. In this process, we have encountered a range of issues when extracting data for specific datetime ranges, where the returned waveforms often contain multiple traces / gaps / overlapping problems.
In most cases when we observe multiple traces on the timeline splitting these traces to isolate the specific waveform we want can be achieved, and becomes manageable when the split times are properly aligned. We understand that these multiple traces often result from the way some recordings are “triggered” rather than continuous. This can create gaps in the data, meaning that multiple traces can exist for a single time selection. An example of this is shown below with HORC during the Darfield earthquake.
However we have identified several more challenging recurring categories of extraction anomalies, each of which is illustrated in the attached images with example cases. These issues were investigated across a subset of events, but we anticipate providing a complete survey of all affected data in the years (2000–2024) in the coming months.
Categories of Issues
From our initial analysis, we found the following categories and the number of records affected (based on a small investigation sample). Each category is illustrated with an image and brief description below.
multi-trace-offset: 69 records
In some extractions we can see that there is offsets between different components and the gaps in the data. Perhaps this is some simple upload or communication error.
overlapping-large: 53 records
Here we have large portions of the extraction times overlapped by exact duplicated data, perhaps a duplicate upload error or another sensor directly next to the current one that's streaming to the same data source?
overlapping-small: 48 records
Here we have small overlapping data, usually small spikes in data where we have multiple entries for the same time. These can easily be picked up and ignored, but knowing why these occur and a potential removal of these would be appreciated.
multi-H: 26 records
Here we have multiple different recordings for the horizontal components, such as channel BN1, BN2 as well as BNN and BNE. Is this intentional and N and E are the rotated components but are stored in the FDSN?
overlapping-large-different-data: 22 records
This is where we have large amounts of the record overlapping, but the data is different in some way and there is currently no clear way to determine which duplicate to use as the truth for this instrument.
Supporting Material
CSV file: multi_trace_issues_summary.csv
- Provides the information such as record_ids categories and datetimes used for extraction for certain stations / channels and locations
- Some records are found in multiple categories due to having multiple issues
Next Steps
We would appreciate help and input as to why we see these interesting scenarios occur, if some of these are fixable to represent the real data without duplicates / overlaps.
Let us know if there is anything else we can provide / help with to tackle this challenge together.
Thanks, Joel.