Skip to content

Feature request: speeding up zeta ts test by not always interpolating #12

@jonahpearl

Description

@jonahpearl

Hi there — thanks so much for creating this statistical test and for implementing a python package for it! I'm excited to use it. That said, I'm finding it a bit slow to run, so I wanted to make a request for the future if possible.

I'm looking specifically at zetatstest. My data is a long interpolated timeseries (30 minutes at 500 Hz), with about 100 sets of 100 types of events. It takes 5-10 seconds to run one pass of zetatstest, so that means it would take a few minutes overall. The slowest parts of the code on my data, based on output from a profiler, are getTimeseriesOffsetOne, and within that where it calls getInterpolatedTimeSeries (it spends ~90% of its time there). It looks like zetapy is automatically choosing to re-interpolate the data every run, regardless of whether the data is already smoothly sampled (which it is, in my case) — it makes sense that this would be a bit slow.

To avoid this, I would suggest allowing the user to provide a optional sampling rate parameter, which would indicate that the data are already interpolated and don't need it again. I'd be happy to try implementing this myself, though I don't understand a lot of the other bits of the code (eg here, what exactly is vecTime) and I'm wary of making changes that accidentally mess up the downstream stats.

If you can confirm that the interpolation is purely superficial (i.e. it's just because some data might not be interpolated, rather than, say, because some pre-processing step is un-interpolating the data and it needs to be re-interpolated), and can help me see what vecTime is for, I'm more than happy to have a go at this change.

(There are some other small things that could also provide speed ups, especially on long timeseries data like mine. For example, the findfirst() function is essentially 2*O(n) right now, since it does a boolean comparison on each value and then calls np.where on the result. Since you've already sorted vecTimestamps, you could use a binary search to go faster.)

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions