Skip to content

Use augur subsample #44

@victorlin

Description

@victorlin

This workflow's subsampling is implemented by a single augur filter call. It works fine, but is limited in capabilities and not portable across pathogen repos.

Description of work

  1. Update the configuration from:

    filter:
    min_length: 8000
    group_by: country year month MuV_genotype division
    exclude: "{build}/exclude.txt"
    include: "{build}/include.txt"
    specific:
    north-america: --subsample-max-sequences 4000 --min-date 2006 --query "region=='North America' & (MuV_genotype=='G')"
    global: --subsample-max-sequences 4000 --min-date 1950

    to something more generic and customizable.

  2. Replace the filter rule with a rule that calls augur subsample.

    This rule should:

    1. Allow concurrent sample runs with threads
    2. Pass a dump of Snakemake's config as a YAML file to --config
    3. Extract the relevant configuration using --config-section
    4. Allow Snakemake to intelligently handle conditional runs of the subsample rule in the case of config changes.
    5. Be compatible with external analysis directories (nextstrain run)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions