-
Notifications
You must be signed in to change notification settings - Fork 30
Description
I am using pyasdf to store catalogs of detected events, which can include thousands of obpsy events and waveforms.
When adding these objects to ASDF dataset with ds.add_quakeml(), I found that it becomes increasingly slower to add events to the dataset the larger the number of events in the dataset is. I suspect this may be due to checking for redundancy between the new event and all the events already in the dataset.
I had initially designed a workflow that looped through a list of detections and added an obspy event and waveforms associated with the event via the event_id for each detection. This makes some intuitive sense, but becomes very slow when adding thousands of waveforms and events. This appears to be circumvented by first loading the events into an obspy catalog and adding the entire catalog to the ASDF dataset. This is a fine solution, but should probably be mentioned in the documentation as the preferred method.
The attached snippet of code should simply demonstrate the issue.
import obspy
from obspy.core.event import Event
from obspy.core.event import Catalog
from obspy.core.event import Origin
import time
import pyasdf
import matplotlib.pyplot as plt
# open ASDF dataset
ds = pyasdf.ASDFDataSet("test_dataset.h5")
# set up run
num_it = 100
times = []
# add each event individually
for i in range(num_it):
# start timer
timer = time.time()
# make obspy event
event = Event()
event.event_type = "ice quake"
origin = Origin()
origin.time = obspy.UTCDateTime(2000+i,1,1)
event.origins = [origin]
# add event to ASDF dataset
ds.add_quakeml(event)
# stop timer
runtime = time.time() - timer
times.append(runtime)
# plot results of adding individual events
fig = plt.plot(range(num_it),times)
ax = plt.gca()
ax.set_ylabel("Time to add event (seconds)")
ax.set_xlabel("Number of events in dataset")
plt.show()
print("Total time to add " + str(num_it) + " events individually: " + str(sum(times)) + " seconds")
# put in an obspy catalog first
event_list = []
# start timer
timer = time.time()
for i in range(num_it):
# make obspy event
event = Event()
event.event_type = "ice quake"
origin = Origin()
origin.time = obspy.UTCDateTime(2000+i,1,1)
event.origins = [origin]
# add event to list
event_list.append(event)
catalog = Catalog(event_list)
ds.add_quakeml(catalog)
# stop timer
runtime = time.time() - timer
# print result of adding catalog
print("Total time to add " + str(num_it) + " events in catalog: " + str(runtime) + " seconds")
