Feature request: speeding up zeta ts test by not always interpolating

Hi there — thanks so much for creating this statistical test and for implementing a python package for it! I'm excited to use it. That said, I'm finding it a bit slow to run, so I wanted to make a request for the future if possible.
 
I'm looking specifically at `zetatstest`. My data is a long interpolated timeseries (30 minutes at 500 Hz), with about 100 sets of 100 types of events. It takes 5-10 seconds to run one pass of `zetatstest`, so that means it would take a few minutes overall. The slowest parts of the code on my data, based on output from a profiler, are `getTimeseriesOffsetOne`, and within that where it calls `getInterpolatedTimeSeries` (it spends ~90% of its time there). It looks like zetapy is automatically choosing to re-interpolate the data every run, regardless of whether the data is already smoothly sampled (which it is, in my case) — it makes sense that this would be a bit slow.

To avoid this, I would suggest allowing the user to provide a optional sampling rate parameter, which would indicate that the data are already interpolated and don't need it again. I'd be happy to try implementing this myself, though I don't understand a lot of the other bits of the code (eg [here](https://github.com/JorritMontijn/zetapy/blob/PyPi/zetapy/ts_dependencies.py#L541), what exactly is vecTime) and I'm wary of making changes that accidentally mess up the downstream stats. 

If you can confirm that the interpolation is purely superficial (i.e. it's just because some data might not be interpolated, rather than, say, because some pre-processing step is un-interpolating the data and it needs to be re-interpolated), and can help me see what vecTime is for, I'm more than happy to have a go at this change.

(There are some other small things that could also provide speed ups, especially on long timeseries data like mine. For example, the `findfirst()` function is essentially 2*O(n) right now, since it does a boolean comparison on each value and then calls `np.where` on the result. Since you've already [sorted](https://github.com/JorritMontijn/zetapy/blob/518bf079b09b3366b3fc314504f5b712478e1deb/zetapy/ts_dependencies.py#L394)  vecTimestamps, you could use a [binary search](https://docs.python.org/3/library/bisect.html#bisect.bisect_left) to go faster.)

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: speeding up zeta ts test by not always interpolating #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature request: speeding up zeta ts test by not always interpolating #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions