dependence on scaling of data

I noticed that the scaling of the data matters, which seems undesirable (and unnecessary).

For example:
```
set.seed(51)
true_mean = rep(c(-0.2,0.1,1,-0.5,0.2,-0.5,0.1,-0.2),c(137,87,17,49,29,52,87,42))
genomdat = list(x = rnorm(500, sd=0.2) + true_mean, true_mean=true_mean)
```

The `cpt.mean` default does not find any changepoints:
```{r}
genomdat.cp = cpt.mean(genomdat$x,method="PELT")
plot(genomdat.cp)
```

But if we multiply the data by 10 we find many changepoints.
```{r}
genomdat.cp = cpt.mean(10*genomdat$x,method="PELT")
plot(genomdat.cp)
```

I speculate that perhaps the cost function (log-likelihood) implicitly assumes the variance
is 1? 

Incidentally to this, while digging around the code to see if I could understand the issue, I
noticed that some places in the code
use "norm.mean" whereas others use "mean.norm". I'm not sure that was intended?
```
Matthews-MacBook-Air-2:changepoint stephens$ grep norm.mean src/*
src/BinSeg_one_func_minseglen.c:     char **cost_func; //Descibe the cost function used i.e. norm.mean.cost (change in mean in normal distributed data)  
src/BinSeg_one_func_minseglen.c:  {"norm.mean", mll_mean},
src/BinSeg_one_func_minseglen.c:  {"norm.meanvar", mll_meanvar},
Matthews-MacBook-Air-2:changepoint stephens$ grep mean.norm src/*
src/BinSeg_one_func_minseglen.c:   else if (strcmp(*cost_func,"mean.norm")==0){
src/BinSeg_one_func_minseglen.c:   else if (strcmp(*cost_func,"mean.norm.mbic")==0){
src/PELT_one_func_minseglen.c:   else if (strcmp(*cost_func,"mean.norm")==0){
src/PELT_one_func_minseglen.c:   else if (strcmp(*cost_func,"mean.norm.mbic")==0){
```

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dependence on scaling of data #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

dependence on scaling of data #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions