Skip to content

Replace current outlier test by the Bonferroni Outlier Test #5

@schnorr

Description

@schnorr

As of today, the outlier detection mechanism is nonexistent for sparse linear algebra (qrmumps) and based on the inter quantile range (IQR) for the dense linear algebra (cholesky). IQR is weak because there is no performance model to anticipate expected behavior, but we do have fair cholesky and qrmumps perf. models, enabling us to use the Bonferroni Outlier Test (available in the the car package with the outlierTest function). Here's a code snippet to classify tasks as outliers once the outlierTest function has been called with a model:

 out <- outlierTest(fit, n.max=Inf)
 out.tibble <- tibble(Order = out$bonf.p %>% names,
                     Bonferonni = out$bonf.p) %>%
    filter(Bonferonni < 0.5)
 df %>%
    mutate(Order = 1:n()) %>%
    mutate(Outlier = case_when(Order %in% out.tibble$Order ~ TRUE,
                               TRUE ~ FALSE)) %>%
    select(-Order)

Where fit contains the model. Note that the order of observations given to the model is important, since the outlierTest reports outliers based on their indexes. So we need to create that order again with the original df observations and then use the set of observations detected as outliers by Bonferroni. Scalability of this approach is yet to be evaluated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions