-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Arnaud's R Tajma's D code is up in the TajD_code folder with files needed to do a small run (250 sequences over the first wave).
It does two things:
- run and box-plot TajD over time for all sequences
- run and box-plot TajD per variant over time
- plot time series of case counts
For the paper on Quebec wave1 and wave2, he used bins of 1 month (which is a parameter choice we can change, e.g. if there's too much contraction/expansion within one month, but it's generally a reasonable time interval), used 200 subsamples per month of 20 randomly selected sequences per subsample. Without a formal test, he estimates a min of 10 - 15 sequences per time interval is needed to get a consistent/stable TajD estimate.
The aim is to run this analysis for Canadian sequences (overall average and per variant of interest, e.g. wildtype, alpha, delta, omicron), which is a larger dataset and over a longer period of time than previous uses.
The code is already parallelized in R (using libraries doParallel and foreach), however we might be able to make it more parallelized, since most steps are not interdependent. Then, getting it to run on CC would be ideal!