Skip to content

Conversation

@Sowiks
Copy link
Contributor

@Sowiks Sowiks commented Nov 11, 2025

This is a PR request to replace mongodb signal_processing_algorithms package with internal implementation as discussed in https://lists.apache.org/thread/4vwp79kmsjd3zbf4fjcgkggf33jot65c . I tried to do minimal changes to existing code (analysis.py and series.py). Better integration is possible, but it can wait till next PR.

There is, however, an issue that I identified during my implementation of the methodology from A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data [Matteson and James](https://arxiv.org/abs/1306.4933). Long story short, it doesn't seem that signal_processing_algorithms==1.3.5 has correct implementation. Namely, let's look at section 2.2 Estimating the Location of a Change Point from the paper, more specifically formula (7) and the following discussion in the last paragraph of that section. I add them here for your convenience:

... Let $Z_1, \cdots Z_t \in ℝ^d$ be an independent sequence of observations and let $1 \leq \tau < \kappa \leq T$ be constants. Now define the following sets $X_\tau = \{ Z_1, Z_2, \cdots , Z_\tau \}$ and $Y_\tau(\kappa) = \{ Z_{\tau + 1}, Z_{\tau + 2} , \cdots , Z_\kappa \}$. A change point location $\hat{\tau}$ is then estimated as
$$(\hat{\tau}, \hat{\kappa}) = \text{arg}\max\limits_{(\tau, \kappa)} \hat{Q} (X_\tau, Y_\tau(\kappa); \alpha).$$
... If it is known that at most one change point exists, we $\kappa = T$. Otherwise, the variable $\kappa$ is introduced to alleviate a weakness of bisection, as mentioned in Venkatraman (1992), in which it may be more difficult to detect certain types of distributional changes in the multiple change point setting using only bisection. For example, if we fix $\kappa = T$ and the set $Y_\tau(T)$ contains observations across multiple change points (e.g., distinct distributions), then it is possible that the resulting mixture distribution in $Y_\tau(T)$ is indistinguishable from the distribution of the observations in $X_\tau$, even when $\tau$ corresponds to a valid change point. We avoid this confounding by allowing $\kappa$ to vary, with minimal computational cost by storing the distances mentioned above. This modification to bisection is similar to that taken in Olshen and Venkatraman (2004).

The main idea of that section is to allow $\kappa$ to vary, not to be simply set to the end of the series $\kappa=T$. However, when I implemented the methodology as in the paper (allowing $\tau < \kappa \leq T$ to vary) tigerbeetle tests failed. With little experimentation I found that erroneous implementation (with fixed $\kappa=T$) resolves the issues with tigerbeetle tests, which is unlikely to be a coincidence. I think this is because signal_processing_algorithms package has a mistake/typo in it where $\kappa$ is fixed at $T$ (at least version 1.3.5). Moreover, this would also explain arguments from Hunter: Using Change Point Detection to Hunt for Performance Regressions [Fleming et al.] that caught my eye. In the section 3.3 Fixed-Sized Windows the authors say:

As we began using Hunter on larger and larger data series, we discovered that change points identified in previous runs would suddenly disappear from Hunter’s results. This issue turned out to be caused by performance regressions that were fixed shortly after being introduced. This is a known issue with E-divisive means and is discussed in [5]. Because E-divisive means divides the time series into two parts, most of the data points on either side of the split showed similar values. The algorithm, therefore, by design, would treat the two nearby changes as a temporary anomaly, rather than a persistent change, and therefore filter it out.

The issues they discussed in that section seems to be related to the same idea: if you don't allow $\kappa$ to vary, the algorithm might miss some change points if they are within the interval. I wonder if they also used fixed $\kappa$ instead, which lead to the described issues.

Nevertheless, going back to this PR. It contains two three commits:

  1. First commit matches the output of signal_processing_algorithms in all tests. If there is an error in my logic somewhere, this is the commit to replace the signal_processing_algorithms package.
  2. Second commit also corrects fixed $\kappa$ issues. This results in different results in tigerbeetle tests, which was corrected here.
  3. Edited: added support for Python 3.8 and 3.9.

Finally, I included some visualization of tigerbeetle tests for commit-1 vs commit-2 for you to see if they make sense.

import matplotlib.pyplot as plt
import numpy as np
series = [26705, 26475, 26641, 26806, 26835, 26911, 26564, 26812, 26874, 26682, 15672, 26745, 26460, 26977, 2685
 23547, 23674, 23519, 23670, 23662, 23462, 23750, 23717, 23524, 23588, 23687, 23793, 23937, 23715, 23570, 23730, 23690, 23699, 23670, 23860, 23988, 23652, 23681, 23798, 23728, 23604, 23523, 23412, 23685, 23773, 23771, 23718, 23409, 23739, 23674, 23597, 23682, 23680, 23711, 23660, 23990, 23938, 23742, 23703, 23536, 24363, 24414, 24483, 24509, 24944, 24235, 24560, 24236, 24667, 24730, 28346, 28437, 28436, 28057, 28217, 28456, 28427, 28398, 28250, 28331, 28222, 28726, 28578, 28345, 28274, 28514, 28590, 28449, 28305, 28411, 28788, 28404, 28821, 28580, 27483, 26805, 27487, 27124, 26898, 27295, 26951, 27312, 27660, 27154, 27050, 26989, 27193, 27503, 27326, 27375, 27513, 27057, 27421, 27574, 27609, 27123, 27824, 27644, 27394, 27836, 27949, 27702, 27457, 27272, 28207, 27802, 27516, 27586, 28005, 27768, 28543, 28237, 27915, 28437, 28342, 27733, 28296, 28524, 28687, 28258, 28611, 29360, 28590, 29641, 28965, 29474, 29256, 28611, 28205, 28539, 27962, 28398, 28509, 28240, 28592, 28102, 28461, 28578, 28669, 28507, 28535, 28226, 28536, 28561, 28087, 27953, 28398, 28007, 28518, 28337, 28242, 28607, 28545, 28514, 28377, 28010, 28412, 28633, 28576, 28195, 28637, 28724, 28466, 28287, 28719, 28425, 28860, 28842, 28604, 28327, 28216, 28946, 28918, 29287, 28725, 29148, 29541, 29137, 29628, 29087, 28612, 29154, 29108, 28884, 29234, 28695, 28969, 28809, 28695, 28634, 28916, 29852, 29389, 29757, 29531, 29363, 29251, 29552, 29561, 29046, 29795, 29022, 29395, 28921, 29739, 29257, 29455, 29376, 29528, 28909, 29492, 28984, 29621, 29026, 29457, 29102, 29114, 28924, 29162, 29259, 29554, 29616, 29211, 29367, 29460, 28836, 29645, 29586, 28848, 29324, 28969, 29150, 29243, 29081, 29312, 28923, 29272, 29117, 29072, 29529, 29737, 29652, 29612, 29856, 29012, 30402, 29969, 29309, 29439, 29285, 29421, 29023, 28772, 29692, 29416, 29267, 29542, 29904, 30045, 29739, 29945, 29141, 29163, 29765, 29197, 29441, 28910, 29504, 29614, 29643, 29506, 29420, 29672, 29432, 29784, 29888, 29309, 29247, 29816, 29254, 29813, 29451, 29382, 29618, 28558, 29845, 29499, 29283, 29184, 29246, 28790, 29952, 29145, 29415, 30437, 29227, 29605, 29859, 29156, 29807, 29406, 29734, 29861, 29140, 29983, 29832, 29919, 29896, 29991, 29266, 29001, 29459, 29548, 29310, 29042, 29303, 29894, 29091, 29018, 29537, 29614, 29180, 29736, 29500, 29218, 29581, 28906, 28542, 29306, 28987, 29878, 28865, 30272, 29707, 29662, 29815, 30492, 29347, 30096, 29054, 30238, 28813, 31895, 28915]
def plot(old, new):
    plt.style.use('ggplot')
    plt.plot(series)
    plt.plot(old, np.take(series, old), 'o')
    plt.plot(new, np.take(series, new), 'kx')
    plt.legend(['Data', 'Old', 'New'])
    plt.show()
# window_len=30, max_pvalue=0.01, min_magnitude=0.05
plot(old=[27, 71], new=[15, 71])
output_3_0
# window_len=30, max_pvalue=0.05, min_magnitude=0.05
plot(old=[16, 71], new=[15, 71])
output_4_0
# window_len=30, max_pvalue=0.1, min_magnitude=0.05
plot(old=[16, 71], new=[10, 11, 15, 71, 363])
output_5_0
# window_len=30, max_pvalue=0.2, min_magnitude=0.05
plot(old=[16, 71], new=[10, 11, 15, 71])
output_6_0
# window_len=30, max_pvalue=0.2, min_magnitude=0.0
plot(
    old=[16, 27, 29, 56, 58, 60, 61, 69, 71, 82, 83, 91, 95, 108, 114, 116, 117, 131, 138, 142, 148, 165, 167, 178, 187, 189, 190, 192, 206, 212, 213, 220, 241, 243, 244, 246, 247, 249, 260, 266, 268, 272, 274, 275, 278, 282, 284, 288, 295, 297, 311, 314, 325, 330, 347, 351],
    new=[3, 6, 7, 10, 11, 13, 15, 16, 28, 29, 35, 37, 39, 41, 44, 48, 49, 56, 58, 61, 65, 66, 69, 71, 74, 76, 82, 95, 108, 117, 125, 126, 129, 131, 136, 137, 142, 148, 165, 169, 187, 190, 192, 197, 200, 212, 220, 241, 243, 246, 247, 249, 250, 260, 265, 266, 268, 278, 282, 288, 305, 306, 325, 330, 337, 338, 340, 347, 349, 363]
)
output_7_0
# window_len=30, max_pvalue=0.1, min_magnitude=0.0
plot(
    old=[16, 27, 29, 56, 58, 61, 71, 82, 95, 113, 116, 117, 131, 138, 142, 148, 157, 165, 167, 178, 187, 189, 192, 206, 212, 213, 220, 246, 247, 249, 260, 266, 268, 272, 278, 282, 311, 312, 325, 330, 347, 351],
    new=[3, 6, 10, 11, 15, 16, 28, 29, 35, 37, 39, 41, 44, 48, 49, 61, 71, 95, 117, 131, 142, 148, 165, 169, 192, 206, 212, 260, 265, 268, 278, 282, 288, 305, 363]
)
output_8_0
# window_len=30, max_pvalue=0.01, min_magnitude=0.0
plot(
    old=[27, 61, 71, 82, 95, 131, 142, 148, 192, 212, 249, 260, 265, 353],
    new=[15, 26, 61, 71, 95, 117, 131, 142, 148, 165, 169, 192, 212, 260]
)
output_9_0
# window_len=30, max_pvalue=0.001, min_magnitude=0.0
plot(
    old=[71, 95, 113, 131, 142, 148, 192, 212, 260],
    new=[15, 61, 71, 95, 117, 131, 142, 148, 192, 212, 260]
)
output_10_0
# window_len=30, max_pvalue=0.0001, min_magnitude=0.0
plot(
    old=[71, 95, 113, 131, 192, 212],
    new=[71, 95, 117, 131, 142, 148, 192, 212]
)
output_11_0
# window_len=30, max_pvalue=0.00001, min_magnitude=0.0
plot(old=[71, 95, 131, 192, 212], new=[71, 95, 131, 192, 212])
output_12_0

@Sowiks Sowiks marked this pull request as ready for review November 11, 2025 22:19
@henrikingo
Copy link
Contributor

Thanks a lot @Sowiks for this! You have valuable skill in being able to grasp the academic level math and then still explain your findings to normal people with simple pictures. Btw this is why I like this tigerbeetle demo dataset from 2023. In 200+ points it exercises many of the phenomena you might encounter in this field, and so it captured your bug, or fix rather, too.

Amazingly I vaguely remember how this happened at MongoDB back then. I remember asking about this kappa and the people who had read the jameson paper (I would read it much later) explained that we can choose a value for it freely. So we did and I never thought of it again. We thought of it as a parameter we could choose, not that we were supposed to use all values. Since the by-the-book algorithm ends in a monte carlo simulation, we apparently accepted the fact that the reference implementation in R often produced different change points.

So it seems with your fix the algorithm will perform even better than it ever did. (And even now Otava has outperformed all alternatives with a good margin!) It now seems to hit the blind spots that always annoyed me. In a way Piotr's approach applying small windows kind of achieves the same behavior.

Do I understand correctly that running this Kappa from 0 to T is exactly the same as if I would start with two points, then append one point at a time to the timeseries, re-running otava between each step, and then keeping all change points found along the way? If yes, then it means that storing the previous results becomes the norm and we should pay more attention to a format and api for doing that.

Will review code over the weekend but from the text and pictures I can already tell this is good stuff. Thanks for contributing!

@henrikingo
Copy link
Contributor

Btw, your illustrations also nicely show that with the bugs fixed, unless you're really lax about perf regressions, then min_magnitude is actually unnecessary. It has IMO historically been used to cover up bugs. (Of which this is not the first one.)

Copy link
Contributor

@henrikingo henrikingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a joy to review this. Comments are on understandability, comments, naming.

Oh, please coordinate with Aleksander and the 0.7.0 release when to merge this.

left_interval = interval
right_interval = intervals[i + 1]
break
elif (interval.start is None or interval.start < candidate.index) and (interval.stop is None or candidate.index < interval.stop):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can interval start ever be None? If the end points aren't known I'd say it's not an interval at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, None is not unknown, but a value corresponding to either start or end of the list. When you call array[i:j] it creates a slice slice(i, j) and you are getting a result of array[slice(i, j)]. However, in python you can omit starting and/or ending parameter in slice: array[0:i] == array[:i] == array[slice(None, i)] and array[i:len(array)] == array[i:] == array[slice(i, None)]. This code is just to support such slices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. Could you add a short comment somewhere in the class definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

1. Divisive algorithm.
if the candidate is a new potential change point, i.e., its index is inside any interval, then
we split the interval by the candidate's index to get left and right subseries.
2. Merge step in t-test algorithm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terminology: I would say any variant of the algorithm can use a T-Test, Permutation test, or something else. The Merge step is part of what I'd call split-merge strategy, or perhaps weak change points version. I guess the Hunter paper just called this the Fixed size windows, but I could think of many kinds of windows (such as sliding) that don't need to be merged. And this is at least closely related to the weak change points, but I'm unsure whether weak change points will be needed now? Otoh maybe we'll soon get rid of the split-merge too, in which case we can ignore this discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe we haven't been introduced properly. I'm the one who is serious about naming things. David is the one who knows all about cache invalidation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that terminology is sloppy here :( By "Divisive algorithm" I meant the process of splitting the interval (0, len(series)) into subintervals via change points. By "Merge step" I meant the process of merging split intervals when we eliminate weak change points. It doesn't have to be t-test specifically, correct.

pts = algo.get_change_points(series)
return pts, None
def compute_change_points_orig(series: Sequence[SupportsFloat], max_pvalue: float = 0.001, seed: Optional[int] = None) -> Tuple[PermCPList, Optional[PermCPList]]:
tester = PermutationsSignificanceTester(alpha=max_pvalue, permurations=100, calculator=PairDistanceCalculator, seed=seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

permutations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will correct the typo. But at least it's a consistent typo :)

@dataclass
class CandidateChangePoint:
'''Candidate for a change point. The point that maximizes Q-hat function on [start:end+1] slice'''
index: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be clearer if somewhere you also use either:

start < index <= end

or the equivalent

)start, end)

and then say, "...which correspond to the slice [start:end+1] in python, as well as range(start,end+1)

Could add a mention that indexes are always the original index from the full series [0,T] and not zero-based for each interval. (start < index <= end, never 0 < index <= end-start

A change point at index 0 is impossible, because a single point just cannot change But this IMO follows logically, it is not a pre-condition. (There's a fun philosophical debate here, where exactly the change points are? In the case of a series of git commits, it is clear that the change point is the commit that causes the regression/improvement. Others could argue change is what happens in the gaps between the points of measurement.) In Otava we index change points from 1 to T, because this way they match with their corresponding test result in the input series. So you could make the conclusion Otava is in the camp that change happens, or is observed at least, at the point after the change.

Anyway, should we add an invariant here that index > 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I will correct comments, so they consistently talk in term of slices, not indexes. (I will keep indexes in calculator.py because it's explicit there where we switch from slices to indexes + in my opinion, formulas are easier to follow in index notations there)
  2. I was thinking of doing that in the next PR, when I make a better integration across the otava. One of the criteria of the current implementation was to minimize changes to the existing code in otava for easier review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok with 2.

change_points[index] = tester.change_point(cp.to_candidate(), series, intervals)

recompute(weakest_cp_index)
recompute(weakest_cp_index + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we test for the case where weakest_cp_index == max(index)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't. Good catch! It should be

if weakest_cp_index == len(change_points):
    recompute(weakest_cp_index - 1)
else:
    recompute(weakest_cp_index)
    recompute(weakest_cp_index + 1)

kappas = np.arange(start + 2, end + 2)[None, :]

A = np.zeros((end - start, end - start))
A_coefs = 2 / (kappas - start)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for discussion... but it is my opinion that these coefficient are added to the formula only to pass some robustness "almost certainly at the limit" proof. They are never really explained or justified the way every other term is. They have the effect of muting the q values at the ends of a interval, and significantly inflating those in the midldle. As a result, given many good candidates q, the algorithm tends to pick change points first in the middle of a series.

Anyway, it's for the future, but to me those are free game once we want to make any mods to this implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They scale the values of the estimate to be unbiased (in statistical sense). I strongly suggest keeping them if we care about theoretical support behind the methods. With that being said, the Hunter paper introduces the use of t-test without any theory behind it (unless I missed something), and it's still being used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some value in the theoretical soundness yes. And you are correct that the use of t-test was purely based on empirical observations coupled with subjective judgement = it's faster, deterministic and finds more things that upon inspection a human agrees were valid bugs. (The hunter paper also introduced a method for generating real world but still objective test data sets, but the chronology of development is that changes to the algorithm were done first.) That said the project was done by two PhD's, one of which was in math, and they tested other significance tests too. So it wasn't as uneducated as just me looking at some graphs and picking the one I like. (My above comment is in that category for sure, and like I said, this is just discussion.)

And theoretically speaking I think the t-test is wrong, because performance test results are not known to be normally distributed. Unless of course the thinking is that ultimately everything is.

C[:-1, 1:] = C_coefs[:-1, 1:] * np.flipud(np.cumsum(np.flipud(H[1:, 1:]), axis=0))

# Element of matrix `Q_{i, j}` is equal to `Q(τ, κ) = Q(i + 1, j + 2) = QQ(sequence[start : i + 1], sequence[i + 1 : j + 2])`.
# So, critical point is `τ = i + 1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that you end up with parameters i+1, j+2 here, rather than the more common i, j+1, suggests to me you need to shift your indexing to the left one step. (0,T) or )0,T) not (1,T+1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't have time to do that in the near future, the shifting of indexes can be a separat followup task too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is because values of Q is shifted with respect to the series. Function Q is defined on non-empty consecutive subseries, so the shortest possible subsequences are Q(series[0:1], series[1:2]), which would correspond to Q[0, 0]. The reason I didn't keep Q[i, j] ~ Q(series[0:i], series[i:j] was to reduce matrix sizes by cutting out columns and rows with only zeros. I guess I tried optimizing what I could :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you would prefer to pad matrices with zeros for the sake of indexing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is ok. I think it's clearer now on second read.


def get_candidate_change_point(self, interval: slice) -> CandidateChangePoint:
'''For a given `slice(start, stop)` finds potential critical point in subsequence series[slice],
i.e., from index `start` to `stop - 1` inclusive. For simplicity, we'll use `end = stop - 1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be super clear, maybe use "interval" and ,) if you are talking math, and slice with [] and range with () in python context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do


Q = self._get_Q_vals(start, end)
i, j = np.unravel_index(np.argmax(Q), Q.shape)
return CandidateChangePoint(index=i + 1 + start, qhat=Q[i][j])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this feels reassuringly familiar <3

@Sowiks
Copy link
Contributor Author

Sowiks commented Nov 16, 2025

Thanks a lot @Sowiks for this! You have valuable skill in being able to grasp the academic level math and then still explain your findings to normal people with simple pictures. Btw this is why I like this tigerbeetle demo dataset from 2023. In 200+ points it exercises many of the phenomena you might encounter in this field, and so it captured your bug, or fix rather, too.

Amazingly I vaguely remember how this happened at MongoDB back then. I remember asking about this kappa and the people who had read the jameson paper (I would read it much later) explained that we can choose a value for it freely. So we did and I never thought of it again. We thought of it as a parameter we could choose, not that we were supposed to use all values. Since the by-the-book algorithm ends in a monte carlo simulation, we apparently accepted the fact that the reference implementation in R often produced different change points.

So it seems with your fix the algorithm will perform even better than it ever did. (And even now Otava has outperformed all alternatives with a good margin!) It now seems to hit the blind spots that always annoyed me. In a way Piotr's approach applying small windows kind of achieves the same behavior.

Do I understand correctly that running this Kappa from 0 to T is exactly the same as if I would start with two points, then append one point at a time to the timeseries, re-running otava between each step, and then keeping all change points found along the way? If yes, then it means that storing the previous results becomes the norm and we should pay more attention to a format and api for doing that.

Will review code over the weekend but from the text and pictures I can already tell this is good stuff. Thanks for contributing!

Thank you for the flattering review :)

Regarding your question:

Do I understand correctly that running this Kappa from 0 to T is exactly the same as if I would start with two points, then append one point at a time to the timeseries, re-running otava between each step, and then keeping all change points found along the way?

It's kind of a loaded question, but the short answer is no. However, I think you'll be interested in the long answer.

It's not exactly the same because as we add a point to the end of series it might cause a different point to become the best candidate. The minimal example that I came up with is [0, 29, 60] and adding 27 to it:

>>> series = np.array([0, 29, 60])
>>> calculator.get_next_candidate(slice(0, None)) #  whole series
CandidateChangePoint(index=2, qhat=41.33333333333333) # its x_2 = 60

However,

>>> series = np.array([0, 29, 60, 27])
>>> calculator.get_next_candidate(slice(0, None)) #  whole series
CandidateChangePoint(index=1, qhat=41.5) # its x_1 = 29

Now, the intuition here is that when we had only three points, the jump 0 -> 29 is smaller than 29 -> 60, so 60 has the best potential to be a change point. When we added 4th point, we had enough evidence to see that jump 0 -> [29, 60, 27] has the most potential now. So, the answer is "no" in general. However, I'm not sure if this is possible as the length of sequence increases and whether one point can make a difference anymore.

Next, if I understand correctly why you are asking this question, it is because of the issues described in the Hunter paper. Namely, the claim:

As we began using Hunter on larger and larger data series, we discovered that change points identified in previous runs would suddenly disappear from Hunter’s results. This issue turned out to be caused by performance regressions that were fixed shortly after being introduced. This is a known issue with E-divisive means and is discussed in [5]. Because E-divisive means divides the time series into two parts, most of the data points on either side of the split showed similar values. The algorithm therefore, by design, would treat the two nearby changes as a temporary anomaly, rather than a persistent change, and therefore filter it out.
Figure 1 illustrates this issue.

hunter-regr-mod

I tried to generate data similar to the one on the picture and run a few tests:

>>> def figure1_test(N):
...     base = 440 + np.random.randn(N) * 5
...     drop = 400 + np.random.randn(N) * 5
...     recover = 445 + np.random.randn(N) * 5
...     series = np.concatenate((base, drop, recover))
...     tester = PermutationsSignificanceTester(alpha=0.00001, permurations=100, calculator=PairDistanceCalculator, seed=1)
...     detector = ChangePointDetector(tester, PairDistanceCalculator)
...     points = detector.get_change_points(series)
...     return [p.index for p in points]
...
>>> figure1_test(10)
[10, 20]
>>> figure1_test(100)
[100, 200]
>>> figure1_test(1000) #  took quite a while because of the permutation tester
[1000, 2000]

As you can see (1) the change points are correctly identified for this pattern (2) on wide range of sequence length.

@henrikingo
Copy link
Contributor

Regarding your question:

Do I understand correctly that running this Kappa from 0 to T is exactly the same as if I would start with two points, then append one point at a time to the timeseries, re-running otava between each step, and then keeping all change points found along the way?

It's kind of a loaded question, but the short answer is no. However, I think you'll be interested in the long answer.

It's not exactly the same because as we add a point to the end of series it might cause a different point to become the best candidate. The minimal example that I came up with is [0, 29, 60] and adding 27 to it:

When asking the question, I had slightly misunderstood where this happens: This is about generating the set of Q-values, not the set of change points found. (weak or regular...)

So I think the correct use of kappa increases the set of q-values, and therefore candidate change points, so that the change points that the Hunter paper describes as missing, or disappearing rather, could be found. But you're right that only the best one will be picked, and then of course in the next iteration things have changed, so I guess it is not guaranteed that the variating of kappa will generate all the same change points as would be found by computing the algorithm over all {series[:1], series[:2] ... series[:N]} and just keeping the union of all change points.

Even so, the effect of:

0 < tau < kappa <= T, where kappa goes from 2 to T (your implementation)

seems to me very close to

0 < tau < kappa = t, where t goes from 2 to T (my question)

But as you point out, in the first case we may not actually pick all the change points that would be generated in the second case. But I feel like the potential is there, as the first case should generate the same "peaks" of q-values, but it's not guaranteed, only more likely.

Next, if I understand correctly why you are asking this question, it is because of the issues described in the Hunter paper. Namely, the claim:

My motivation for the question was to understand whether this fully explains the phenomenon of change points that first are found and then disappear. It seems to me it mostly does, but we cannot say for certain it "fully" does so in all scenarios.

def figure1_test(N):
... base = 440 + np.random.randn(N) * 5
... drop = 400 + np.random.randn(N) * 5
... recover = 445 + np.random.randn(N) * 5
... series = np.concatenate((base, drop, recover))

I think to generate a data set that the hunter paper was concerned with, you need the drop to be short, maybe even 1-2 only:

 drop = 400 + np.random.randn(2) * 5

@Sowiks
Copy link
Contributor Author

Sowiks commented Nov 17, 2025

I think to generate a data set that the hunter paper was concerned with, you need the drop to be short, maybe even 1-2 only:

 drop = 400 + np.random.randn(2) * 5

Correct me If I'm wrong, but my understanding was that there are two separate problems:

  1. Disappearing of previous found critical points.
  2. Not detecting the critical points in the first place (because the number of abnormal points is small)

@henrikingo
Copy link
Contributor

I think to generate a data set that the hunter paper was concerned with, you need the drop to be short, maybe even 1-2 only:

 drop = 400 + np.random.randn(2) * 5

Correct me If I'm wrong, but my understanding was that there are two separate problems:

  1. Disappearing of previous found critical points.
  2. Not detecting the critical points in the first place (because the number of abnormal points is small)

No, these are the same problem. The change points disappear when the interval/ window they are in, grows larger. I always assumed this was a feature: In a short timeseries, say 50-100 points, MongoDB e-divisive with typical parameters would ignore spikes that last a single point only, and might alert for a plateu of 2-3 points that then returns to the original level. (But even then would only produce 1 change point, because original MongoDB implementation needed a hard coded 3 points before it would alert anything at all, so it is not possible to find 2 neighboring change points. This is from the Matteson paper and their R reference implementation I believe defaulted to a leading 30 points or so. Which would be a long time to wait for a jira ticket if it was nightly builds!)

...where was I... So then if the series keeps growing , my interpretation is that the short lived change becomes less significant compared to the entire series, so eventually it is ignored by the algorithm, just as if it was a single point. Conversely, also a single point could trigger an alert if it was large enough. (At least assuming that the series on both of its sides aren't perfectly constant.)

The fix of adding a window is based on the above understanding: it creates a situation where the local computation doesn't take into account more than a small number of local points.

And this is why I asked earlier whether Kappa is now equivalent to observing a series grow from 1 point and computing the algorithm for every added point.

@Sowiks
Copy link
Contributor Author

Sowiks commented Nov 18, 2025

I see, thank you for clarification. I'll need to think about it.

@Sowiks
Copy link
Contributor Author

Sowiks commented Nov 23, 2025

  1. Brought comments, code, and variable names across all files to the same indexing notations that are related to sub-series. Now everything is defined in the python slice notations, i.e., array[start : end]. First index start always included, last index end always excluded. Variable usage start and end is also consistent throughout the files. Variable name stop is not used, except for cases when working with python built-in slice object directly (in those cases the slice object field stop is equal to variables end).
  2. Added comments describing original and split-merge change point detection algorithms. Got rid of mongodb implementation references in the comments.
  3. Corrected assert statement in analysis.py
  4. Added comments in analysis.py explaining None values for intervals.start and intervals.stop
  5. Renamed _calculated_distances method to _calculate_pairwise_differences in PairDistanceCalculator class in calculator.py
  6. Renamed significance threshold variable alpha to max_pvalue across all files for consistence and clarity
  7. Corrected typos in variable name permurations to permutations across all files
  8. Corrected annotation typo for variable power in calculator.py (int to float)
  9. Corrected typo in calculator.py (matix to matrix)

@Sowiks
Copy link
Contributor Author

Sowiks commented Nov 23, 2025

I didn't fix the recompute calls for the case weakest_cp_index == max(index) in the analysis.py. I want to do this in the next PR. I have a suspicious that there is a bug there (both in master and current implementations), but I need more time to investigate.

@henrikingo
Copy link
Contributor

Thank you @Sowiks , appreciate the attention to commenting the algorithm as understandable as possible, and attention to detail in keeping a certain consistency in both variable naming and indexing. You already have my approval from the previous review, but wanted to re-affirm it here that I don't think I have any further comments on this PR.

We should however wait until Tuesday to hear from @Gerrrr how we proceed with the 0.7.0 release. I'm sensing we might create a separate branch for the backward compatible releases. If not, the we'll just have to wait a few weeks while we iterate through the ASF voting process to get the release out of the way.

@henrikingo
Copy link
Contributor

I didn't fix the recompute calls for the case weakest_cp_index == max(index) in the analysis.py. I want to do this in the next PR. I have a suspicious that there is a bug there (both in master and current implementations), but I need more time to investigate.

First question is whether this split-merge-recompute is even needed after the re-implementation. If the problem that it fixes goes away with your addition of Kappa, then we should remove it.

@Gerrrr
Copy link
Contributor

Gerrrr commented Nov 28, 2025

Great work, @Sowiks! FYI I cut a separate branch for the next 0.7.0 release - https://github.com/apache/otava/tree/0.7, so feel free to merge this PR whenever you are ready.

@henrikingo
Copy link
Contributor

I guess we'll have to merge it, but @Sowiks do you have the contributor agreement (ICLA) signed with the ASF? (I didn't find a place where I could check myself.)

@dave2wave
Copy link
Member

I checked and an ICLA has not yet been filed.

@henrikingo
Copy link
Contributor

@Sowiks : https://www.apache.org/licenses/contributor-agreements.html

Short version is, download the ICLA pdf, sign it either with a pen or GnuPG, email it to secretary@apache.org

@Gerrrr
Copy link
Contributor

Gerrrr commented Dec 2, 2025

@dave2wave @michaelsembwever @henrikingo to my knowledge, a casual contributor does not have to sign ICLA. For example, I recall signing ICLA only right before becoming Apache Cassandra committer. Isn't it so?

@henrikingo
Copy link
Contributor

At least the guide I link to above uses words like "all contributors" and "every developer". We should maybe do this on the mailing list if you want to debate it more thoroughly.

@Sowiks
Copy link
Contributor Author

Sowiks commented Dec 4, 2025

Sent the form.

@henrikingo henrikingo merged commit 5c5abc7 into apache:master Dec 4, 2025
4 checks passed
@henrikingo
Copy link
Contributor

Yay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants