-
Notifications
You must be signed in to change notification settings - Fork 21
Description
I thought this could be a fun task for a new contributor interested in change point detection. This is focused on generating test cases, but in addition could be an interesting blog post to educate a wider public on what exactly change detection does with benchmarking results. Third, this set of test cases will be an excellent benchmark with which we can compare Otava against competing tools and algorithms. (e.g. anomaly detection is frequently offered in services like Datadog or Grafana.)
Sub-tasks:
- Create the below "basic building block" timeseries, as well as combinations of them. Use python preferably. (Generating a file with CSV data, for example with Excel, is kind of possible...)
- Create Otava test cases with pytest that used the data sets.
- Test also other alogirhtms and tools
- Write a blog post with your findings
- Assumptions:
The problem space is one where a performance test is run repeatedly, generating a time series of measurements. Ideally this would produce an infinitely long constant timeseries, but in practice the output of the performance test includes a random noise component. Strictly speaking we do not know the distribution of this noise, and therefore also not of the test results, but in practice assuming a normal distribution seems to work well. Finally, there will be performance regressions and improvements in the system under test. These are discrete steps up or down, after which the the timeseries again continues as a function of constant + noise.
In other words, things we do NOT encounter in the domain are for example trending, where the timeseries keeps increasing or decreasing at a constant rate, or cyclic behavior, such as seasonality at specific time of day, week or year.
- The basic building blocks
These timeseries can themselves already be used as test cases:
Parameter L = [50, 500] is the length of all these timeseries. For simplicity, most of the below also have a negative counterpart, which is omitted. (e.g. decrease vs increase)
Let's begin:
Constant: S = x, x, x, x...
Noise, normally distributed: S = x1, x2, x3, ... where X = N(0, sigma)
- for benchmarking purposes, we can define a maximum range where all values are within 99.99% percentile, or roughly 4 standard deviations. This makes it possible for an algorithm to correctly detect 100% of cases, as this noise component is not unbounded.
Noise, uniform distributed, aka white noise or static noise: random(min, max)
Outlier, aka anomaly, is a single deviating point: S = x,x,x,x,x,x',x,x... where x' != x
Step function, aka single change point: S = x1,x1,x1,x2,x2,x2,x2.....
Regression + fix soon after: S = x1,x1...x2,...x2, x3, x3, x3...
- Amount of x2 points is small compared to x1 and x3, but at least 2 points
- Special case: x1 == x3
- Special case: x2 is a single point. This can happen, but mathematically this is indistinguishable from a single outlier (see above).
- More interesting phenomena
Banding, is a form of noise (unwanted change) where the results oscillate randomly between two values.
S = x1, x2, x,2, x1, x2, x1, x1, x1, x2, x2, x1, x2, x2...
Typically:
abs(x2 - x1) << x1
and also:
x1,x2 > std dev when random noise is mixed in
Constant mean, change in variance: S = N(0, mu1).... , N(0, mu2)...
mu2 > mu1
(mu1 > mu2 is the negative case)
Constant mean and variance, but phase changes: S = cos(x)..., sin(x)...
Multiple consecutive changes: S= x0,x0,x0... x1, x2, ... xn, xn, xn....
Where x0 <x1 < x2 ... < xn
- It depends on the specific problem space what the right behavior is here. For perf testing it is possible that multiple independent improvements were merged back to back.
-
Generate all possible combinations of the above, including with itself. (e.g. more than a single outlier, more than a singe phase change...)
-
Verify that otava handles all of the above.