From 2f5cbbdd96b92ac173fc410147fb2618766da46c Mon Sep 17 00:00:00 2001 From: Emma Hodcroft Date: Mon, 22 Jun 2020 13:28:02 +0200 Subject: [PATCH] update docs for tutorial --- docs/customizing-analysis.md | 51 ++++++++++++++++++++++++++++++++---- docs/running.md | 3 +++ 2 files changed, 49 insertions(+), 5 deletions(-) diff --git a/docs/customizing-analysis.md b/docs/customizing-analysis.md index 8ab16f0..6ab0d5e 100644 --- a/docs/customizing-analysis.md +++ b/docs/customizing-analysis.md @@ -41,16 +41,25 @@ We implement hierarchical subsampling by producing multiple samples at different and merge these samples into one file for further analysis. A build can specify any number of such samples which can be flexibly restricted to particular meta data fields and subsampled from groups with particular properties. -For canton's this looks like this: +When specifying subsampling in this way, we'll first take sequences from the 'focal' area, and the select samples from other geographical areas. +Read further for information on how we select these samples. + +In this example, we'll look at a subsampling scheme which defines a `canton`. +Cantons are regional divisions in Switzerland - below 'country,' but above 'location' (often city-level). +Here, we'd like to be able to specify a particular 'canton' and do focal sampling there, with contextual samples from elsewhere in the country, other countries in the region, and other regions in the world. + +For cantons this looks like this: ```yaml subsampling: - # Default subsampling logic for divisions - canton: + # We are calling this type of sampling 'canton'. + # As a 'canton' is a division, we'll start by defining what kind of sampling we want at the 'division' level, for whatever canton we specify + canton: ## Build name # Focal samples for division (only samples from a specifed division with 300 seqs per month) division: group_by: "year month" seq_per_group: 300 exclude: "--exclude-where 'region!={{region}}' 'country!={{country}}' 'division!={{division}}'" + # Now we'll specify the types of 'contextual' samples we want: # Contextual samples from division's country country: group_by: "division year month" @@ -77,11 +86,11 @@ subsampling: type: "proximity" focus: "division" ``` -All entries above canton level specify priorities. Currently, we have only implemented +All entries above canton level (the 'contextual' samples) specify priorities. Currently, we have only implemented one type of priority called `proximity`. It attempts to selected sequences as close as possible to the focal samples specified as `focus: division`. -The argument of the latter has to match the name of one of the other subsamples. +The argument of the latter has to match the name of one of the other subsamples. If you need parameters in a way that isn't represented by the configuration file, [create a new issue in the ncov repository](https://github.com/nextstrain/ncov/issues/new) to let us know. @@ -123,3 +132,35 @@ These are specified in `default_config/clades.tsv` like so: A1a ORF3a 251 V A1a ORF1a 3606 F ``` + + +## Keeping a 'Location Build' Up-To-Date + +If you are aiming to create a public health build for a state, division, or area of interest, you likely want to keep your analysis up-to-date easily. +If your run contains contextual subsampling (sequences from outside of your focal area), you should first ensure that you regularly download the latest sequences as input, then re-run the build. +This way, you always have a build that reflects the most recent SARS-CoV-2 information. + +You should also aim to keep the `ncov` repository updated. +If you've clone the repository from Github, this is done by running `git pull`. +This downloads any changes that we have made to the repoistory to your own computer. +In particular, we add new colors and latitute & longitude information regularly - these should match the new sequences you download, so that you don't need to add this information yourself. + +If you don't need to share your build 'profile' with anyone, then it's simple to leave this in the `/profiles` folder. +It won't be changed when you `git pull` for the latest information. + +However, if you want to share your build profile, you'll need to adopt one of the following solutions. +First, you can 'fork' the entire `ncov` repository, which means you have your own copy of the repository. +You can then add your profile files to the repository and anyone else can download them as part of your 'fork' of the repository. +Note that if you do this, you should ensure you `pull` regularly from the original `ncov` repository to keep it up-to-date. + +Alternatively, you can create a new repository just to hold your 'profile' files, outside of the `ncov` repository. +You can then share this repository with others, and it's very simple to keep `ncov` up to date, as you don't change it at all. +If doing this, it can be easiest to create a 'profiles' folder and imitate the structure found in the 'profiles' folder within `ncov`, but this isn't required. +Note that to run the build using the profile you'll need still run the `snakemake` command from within the `ncov` repository, but specify that the profile you want is outside that folder. + +For the [`south-usa-sarscov2`](https://github.com/emmahodcroft/south-usa-sarscov2/) example, you can see the `south-central ` profile set up in a 'profiles' folder. +To run this, one would call the following from within `ncov`: + +```bash +ncov$ snakemake --profile ../south-usa-sarscov2/profiles/south-central/ +``` diff --git a/docs/running.md b/docs/running.md index 23a2a91..6cc9908 100644 --- a/docs/running.md +++ b/docs/running.md @@ -88,6 +88,9 @@ Adding it to that file (and rerunning the Snakemake rules downstream of this) sh We generate the colors from the `colors` rule in the Snakefile, which uses the [ordering TSV](./default_config/ordering.tsv) to generate these. See ['customizing your analysis'](customizing-analysis.md) for more info. +_*A note about locations and colors:*_ +Unless you want to specifically override the colors generated, it's usually easier to _add_ information to the default `ncov` files, so that you can benefit from all the information already in those files. + #### My genomes aren't included in the analysis There are a few steps where sequences can be removed: