Speeding up TajD code

Arnaud's R Tajma's D code is up in the `TajD_code` folder with files needed to do a small run (250 sequences over the first wave).
It does two things:  
- run and box-plot TajD over time for all sequences  
- run and box-plot TajD per variant over time  
- plot time series of case counts   

For the paper on Quebec wave1 and wave2, he used bins of 1 month (which is a parameter choice we can change, e.g. if there's too much contraction/expansion within one month, but it's generally a reasonable time interval), used 200 subsamples per month of 20 randomly selected sequences per subsample. Without a formal test, he estimates a min of 10 - 15 sequences per time interval is needed to get a consistent/stable TajD estimate.   
The aim is to run this analysis for Canadian sequences (overall average and per variant of interest, e.g. wildtype, alpha, delta, omicron), which is a larger dataset and over a longer period of time than previous uses.  
The code is already parallelized in R (using libraries doParallel and foreach), however we might be able to make it more parallelized, since most steps are not interdependent. Then, getting it to run on CC would be ideal!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up TajD code #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speeding up TajD code #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions