Skip to content

AndrewC1998/changepoint.online

 
 

Repository files navigation

changepoint.online

Online streaming changepoint detection in R Package

This repository is a fork of my rkillick/changepoint.online (GSoC 2018) projects and aims to keep the same public interface while maintaining the codebase.


Simple Explanation of Project

Given a time-ordered sequence $x_1, x_2, \dots$, we want to detect regime changes (changepoints) as data arrives, rather than retrospectively at the end of the sample. Typical regimes are piecewise-constant parameters such as:

  • change in mean,
  • change in variance,
  • change in mean and variance,
  • (optionally) a nonparametric notion of change via energy-style statistics (“ECP”).

The package provides online versions of the familiar changepoint workflow by splitting the offline calls into:

  1. an initialisation step (create state), then
  2. repeated update steps (append new data, update estimates).

Installation

From GitHub

# install.packages("remotes")
remotes::install_github("AndrewC1998/changepoint.online")

From source (local)

R CMD build .
R CMD INSTALL changepoint.online_*.tar.gz

Quick start

(A) Online change in mean

library(changepoint.online)

set.seed(1)
x <- c(rnorm(50, 0, 1), rnorm(50, 5, 1), rnorm(50, 10, 1), rnorm(50, 3, 1))
y <- c(rnorm(50, 15, 1), rnorm(50, 25, 1), rnorm(50, 33, 1), rnorm(50, 7, 1))

state <- ocpt.mean.initialise(x)          # initialise on initial batch
state <- ocpt.mean.update(state, y)       # update with new batch

cpts(state)                               # estimated changepoint locations (end-of-segment indices)

(B) Online change in variance

set.seed(1)
x <- c(rnorm(100, 0, 1), rnorm(100, 0, 3))
state <- ocpt.var.initialise(x, test.stat="Normal")
state <- ocpt.var.update(state, rnorm(100, 0, 0.5))
cpts(state)

(C) Nonparametric “ECP” style online change

set.seed(1)
x <- matrix(c(rnorm(100, 50, 1), rnorm(100, 5, 1)), ncol=1)
y <- matrix(c(rnorm(100, 15, 1), rnorm(100, 25, 1)), ncol=1)

state <- ocpt.np.initialise(x)
state <- ocpt.np.update(state, y, K=3)

state

Core mathematics

1) Penalised segmentation objective (offline view)

Most routines here can be understood through the standard penalised segmentation problem:

Choose changepoints $0=\tau_0 &lt; \tau_1 &lt; \dots &lt; \tau_m &lt; \tau_{m+1}=t$ to minimise

$$\sum_{j=0}^{m} \mathcal{C}\big(x_{\tau_j+1:\tau_{j+1}}\big) + m ,\beta,$$

where $\mathcal{C}(\cdot)$ is a segment cost and $\beta$ is a penalty controlling the number of changepoint (AIC/BIC/MBIC/Manual/Asymptotic/CROPS-style options).

2) PELT dynamic programming (exact multiple changepoints)

For exact multiple changepoints, PELT solves the recursion

$$F(t) = \min_{0 \le s < t} { F(s) + \mathcal{C}(x_{s+1:t}) + \beta },$$

with a pruning rule that removes candidate last-changepoint locations when they can no longer be optimal. This yields exact changepoints with linear expected time in the offline setting.

In this package, PELT.online operates on summary statistics so that segment costs can be computed quickly.

3) “Online” update viewpoint

At time $t$, you have state that contains:

  • cumulative summary statistics needed to compute $\mathcal{C}(x_{s+1:t})$ fast,
  • the current best segmentation according to the chosen method,
  • any tuning constants (penalty, minseglen, distribution choice, etc.).

When new data $x_{t+1}, \dots, x_{t+k}$ arrives, the update step:

  1. extends the summary statistics,
  2. recomputes the relevant dynamic program quantities for the new horizon,
  3. returns an updated ocpt* object with updated changepoint estimates.

4) Nonparametric “ECP” option

When test.stat="ECP" (or using the explicit ocpt.np.*), the algorithm switches to a nonparametric change statistic based on distances between observations in different segments (with a moment index alpha and window size delta exposed as tuning parameters). This is useful when parametric assumptions (Normal/Poisson/etc.) are questionable.


Key user-facing functions

Main entrypoints

Mean changes

  • ocpt.mean.initialise(data, ...) / alias ocpt.mean.initialize(...)
  • ocpt.mean.update(previousanswer, newdata)

Variance changes

  • ocpt.var.initialise(data, ...) / alias ocpt.var.initialize(...)
  • ocpt.var.update(previousanswer, newdata)

Mean + variance changes

  • ocpt.meanvar.initialise(data, ...) / alias ocpt.meanvar.initialize(...)
  • ocpt.meanvar.update(previousanswer, newdata)

Nonparametric / ECP

  • ocpt.np.initialise(data, ...) / alias ocpt.np.initialize(...)
  • ocpt.np.update(previousanswer, newdata, K)

Developer-level (advanced)

  • PELT.online(sumstat, pen, cost_func, ...)
    A low-level PELT step operating on a matrix of summary statistics (exported for developers; no argument checking).
  • PELT.online.initialise(...), PELT.online.update(...)
    Convenience wrappers used internally when method="PELT".

References

  • Killick, R., Fearnhead, P., & Eckley, I. A. (2012). Optimal detection of changepoints with a linear computational cost. JASA, 107(500), 1590–1598.

License

GPL (see DESCRIPTION / LICENSE if present).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • R 73.8%
  • C++ 11.1%
  • TeX 9.0%
  • C 6.1%