Hi there — thanks so much for creating this statistical test and for implementing a python package for it! I'm excited to use it. That said, I'm finding it a bit slow to run, so I wanted to make a request for the future if possible.
I'm looking specifically at zetatstest. My data is a long interpolated timeseries (30 minutes at 500 Hz), with about 100 sets of 100 types of events. It takes 5-10 seconds to run one pass of zetatstest, so that means it would take a few minutes overall. The slowest parts of the code on my data, based on output from a profiler, are getTimeseriesOffsetOne, and within that where it calls getInterpolatedTimeSeries (it spends ~90% of its time there). It looks like zetapy is automatically choosing to re-interpolate the data every run, regardless of whether the data is already smoothly sampled (which it is, in my case) — it makes sense that this would be a bit slow.
To avoid this, I would suggest allowing the user to provide a optional sampling rate parameter, which would indicate that the data are already interpolated and don't need it again. I'd be happy to try implementing this myself, though I don't understand a lot of the other bits of the code (eg here, what exactly is vecTime) and I'm wary of making changes that accidentally mess up the downstream stats.
If you can confirm that the interpolation is purely superficial (i.e. it's just because some data might not be interpolated, rather than, say, because some pre-processing step is un-interpolating the data and it needs to be re-interpolated), and can help me see what vecTime is for, I'm more than happy to have a go at this change.
(There are some other small things that could also provide speed ups, especially on long timeseries data like mine. For example, the findfirst() function is essentially 2*O(n) right now, since it does a boolean comparison on each value and then calls np.where on the result. Since you've already sorted vecTimestamps, you could use a binary search to go faster.)
Thanks!
Hi there — thanks so much for creating this statistical test and for implementing a python package for it! I'm excited to use it. That said, I'm finding it a bit slow to run, so I wanted to make a request for the future if possible.
I'm looking specifically at
zetatstest. My data is a long interpolated timeseries (30 minutes at 500 Hz), with about 100 sets of 100 types of events. It takes 5-10 seconds to run one pass ofzetatstest, so that means it would take a few minutes overall. The slowest parts of the code on my data, based on output from a profiler, aregetTimeseriesOffsetOne, and within that where it callsgetInterpolatedTimeSeries(it spends ~90% of its time there). It looks like zetapy is automatically choosing to re-interpolate the data every run, regardless of whether the data is already smoothly sampled (which it is, in my case) — it makes sense that this would be a bit slow.To avoid this, I would suggest allowing the user to provide a optional sampling rate parameter, which would indicate that the data are already interpolated and don't need it again. I'd be happy to try implementing this myself, though I don't understand a lot of the other bits of the code (eg here, what exactly is vecTime) and I'm wary of making changes that accidentally mess up the downstream stats.
If you can confirm that the interpolation is purely superficial (i.e. it's just because some data might not be interpolated, rather than, say, because some pre-processing step is un-interpolating the data and it needs to be re-interpolated), and can help me see what vecTime is for, I'm more than happy to have a go at this change.
(There are some other small things that could also provide speed ups, especially on long timeseries data like mine. For example, the
findfirst()function is essentially 2*O(n) right now, since it does a boolean comparison on each value and then callsnp.whereon the result. Since you've already sorted vecTimestamps, you could use a binary search to go faster.)Thanks!