Skip to content

Speeding up TajD code #2

@murallcl

Description

@murallcl

Arnaud's R Tajma's D code is up in the TajD_code folder with files needed to do a small run (250 sequences over the first wave).
It does two things:

  • run and box-plot TajD over time for all sequences
  • run and box-plot TajD per variant over time
  • plot time series of case counts

For the paper on Quebec wave1 and wave2, he used bins of 1 month (which is a parameter choice we can change, e.g. if there's too much contraction/expansion within one month, but it's generally a reasonable time interval), used 200 subsamples per month of 20 randomly selected sequences per subsample. Without a formal test, he estimates a min of 10 - 15 sequences per time interval is needed to get a consistent/stable TajD estimate.
The aim is to run this analysis for Canadian sequences (overall average and per variant of interest, e.g. wildtype, alpha, delta, omicron), which is a larger dataset and over a longer period of time than previous uses.
The code is already parallelized in R (using libraries doParallel and foreach), however we might be able to make it more parallelized, since most steps are not interdependent. Then, getting it to run on CC would be ideal!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Low priorityThis is unlikely to be worked up soon.VisualizationWork needed to improve visualization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions