-
Notifications
You must be signed in to change notification settings - Fork 37
dependence on scaling of data #26
Copy link
Copy link
Open
Labels
Description
I noticed that the scaling of the data matters, which seems undesirable (and unnecessary).
For example:
set.seed(51)
true_mean = rep(c(-0.2,0.1,1,-0.5,0.2,-0.5,0.1,-0.2),c(137,87,17,49,29,52,87,42))
genomdat = list(x = rnorm(500, sd=0.2) + true_mean, true_mean=true_mean)
The cpt.mean default does not find any changepoints:
genomdat.cp = cpt.mean(genomdat$x,method="PELT")
plot(genomdat.cp)
But if we multiply the data by 10 we find many changepoints.
genomdat.cp = cpt.mean(10*genomdat$x,method="PELT")
plot(genomdat.cp)
I speculate that perhaps the cost function (log-likelihood) implicitly assumes the variance
is 1?
Incidentally to this, while digging around the code to see if I could understand the issue, I
noticed that some places in the code
use "norm.mean" whereas others use "mean.norm". I'm not sure that was intended?
Matthews-MacBook-Air-2:changepoint stephens$ grep norm.mean src/*
src/BinSeg_one_func_minseglen.c: char **cost_func; //Descibe the cost function used i.e. norm.mean.cost (change in mean in normal distributed data)
src/BinSeg_one_func_minseglen.c: {"norm.mean", mll_mean},
src/BinSeg_one_func_minseglen.c: {"norm.meanvar", mll_meanvar},
Matthews-MacBook-Air-2:changepoint stephens$ grep mean.norm src/*
src/BinSeg_one_func_minseglen.c: else if (strcmp(*cost_func,"mean.norm")==0){
src/BinSeg_one_func_minseglen.c: else if (strcmp(*cost_func,"mean.norm.mbic")==0){
src/PELT_one_func_minseglen.c: else if (strcmp(*cost_func,"mean.norm")==0){
src/PELT_one_func_minseglen.c: else if (strcmp(*cost_func,"mean.norm.mbic")==0){
Reactions are currently unavailable