Push window_len down into the core e-divisive computation

In Otava, Piotr introduced window_len as a user facing parameter, and for a given data point, only `window_len/2` data points are included in the computation. This gives better quality results, in particular it makes it possible to find two adjacent change points. This is because computation is focused on the local neighborhood of each candidate change point, and data points far away are completely ignored.

In Piotr's version, the input data is split into `2*datapoints/window_len` windows and the regular e-divisive is run on each window. After this there is a process to merge the small windows together. The windows also overlap, so the same point could be found once or twice, with different p-value for each.

Mathematically it would be more elegant, and consistent, and potentially also a performance improvement (because we don't count every point twice because no need for overlap) to rather just do a single pass, as in the original e-divisive, but incorporate window length into the core e-divisive code. This could be something like this:

https://github.com/mongodb/signal-processing-algorithms/blob/74d6a8fa3a377ee5dce4821ba9dfd2f2528790b2/src/signal_processing_algorithms/e_divisive.py#L222-L250

        length = window_len
        n = 1   # Note: this is 2 in the quoted code, but that is unnecessarily conservative.
        w_start = max( n - window_len / 2, 0)
        w_stop = min( n + window_len / 2, len(series) )
        m = w_stop - n

        # term1 = sum(diffs[i][j] for i in range(w_start, n) for j in range(n, w_stop))
        term1 = np.sum(diffs[w_start:n, n:w_stop])

        # term2 = sum(diffs[i][k] for i in range(w_start, n) for k in range(i + 1, n))
        term2 = np.sum(np.triu(diffs[w_start:n, w_start:n], 0))

        # term3 = sum(diffs[j][k] for j in range(n, w_stop)
        #                         for k in range(j + 1, w_stop)) 
        term3 = np.sum(np.triu(diffs[n:w_stop, n + 1 : w_stop], 0))

        qhat_values[n] = self.calculate_q(term1, term2, term3, m, n)

        for n in range(3, (length - 2)):
            w_start = max( n - window_len / 2, 0)
            w_stop = min( n + window_len / 2, len(series) )
            m = w_stop - n

            # TODO: may need a +1 and / or -1 somewhere
            column_delta = np.sum(diffs[n - 1, w_start : n - 1])
            row_delta = np.sum(diffs[n:w_stop, n - 1])

            term1 = term1 - column_delta + row_delta
            term2 = term2 + column_delta
            term3 = term3 - row_delta

            qhat_values[n] = self.calculate_q(term1, term2, term3, m, n-w_start)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Push window_len down into the core e-divisive computation #84

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Push window_len down into the core e-divisive computation #84

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions