Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 46 additions & 5 deletions docs/customizing-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,25 @@ We implement hierarchical subsampling by producing multiple samples at different
and merge these samples into one file for further analysis.
A build can specify any number of such samples which can be flexibly restricted to particular
meta data fields and subsampled from groups with particular properties.
For canton's this looks like this:
When specifying subsampling in this way, we'll first take sequences from the 'focal' area, and the select samples from other geographical areas.
Read further for information on how we select these samples.

In this example, we'll look at a subsampling scheme which defines a `canton`.
Cantons are regional divisions in Switzerland - below 'country,' but above 'location' (often city-level).
Here, we'd like to be able to specify a particular 'canton' and do focal sampling there, with contextual samples from elsewhere in the country, other countries in the region, and other regions in the world.

For cantons this looks like this:
```yaml
subsampling:
# Default subsampling logic for divisions
canton:
# We are calling this type of sampling 'canton'.
# As a 'canton' is a division, we'll start by defining what kind of sampling we want at the 'division' level, for whatever canton we specify
canton: ## Build name
# Focal samples for division (only samples from a specifed division with 300 seqs per month)
division:
group_by: "year month"
seq_per_group: 300
exclude: "--exclude-where 'region!={{region}}' 'country!={{country}}' 'division!={{division}}'"
# Now we'll specify the types of 'contextual' samples we want:
# Contextual samples from division's country
country:
group_by: "division year month"
Expand All @@ -77,11 +86,11 @@ subsampling:
type: "proximity"
focus: "division"
```
All entries above canton level specify priorities. Currently, we have only implemented
All entries above canton level (the 'contextual' samples) specify priorities. Currently, we have only implemented
one type of priority called `proximity`.
It attempts to selected sequences as close as possible to the focal samples
specified as `focus: division`.
The argument of the latter has to match the name of one of the other subsamples.
The argument of the latter has to match the name of one of the other subsamples.

If you need parameters in a way that isn't represented by the configuration file, [create a new issue in the ncov repository](https://github.com/nextstrain/ncov/issues/new) to let us know.

Expand Down Expand Up @@ -123,3 +132,35 @@ These are specified in `default_config/clades.tsv` like so:
A1a ORF3a 251 V
A1a ORF1a 3606 F
```


## Keeping a 'Location Build' Up-To-Date

If you are aiming to create a public health build for a state, division, or area of interest, you likely want to keep your analysis up-to-date easily.
If your run contains contextual subsampling (sequences from outside of your focal area), you should first ensure that you regularly download the latest sequences as input, then re-run the build.
This way, you always have a build that reflects the most recent SARS-CoV-2 information.

You should also aim to keep the `ncov` repository updated.
If you've clone the repository from Github, this is done by running `git pull`.
This downloads any changes that we have made to the repoistory to your own computer.
In particular, we add new colors and latitute & longitude information regularly - these should match the new sequences you download, so that you don't need to add this information yourself.

If you don't need to share your build 'profile' with anyone, then it's simple to leave this in the `/profiles` folder.
It won't be changed when you `git pull` for the latest information.

However, if you want to share your build profile, you'll need to adopt one of the following solutions.
First, you can 'fork' the entire `ncov` repository, which means you have your own copy of the repository.
You can then add your profile files to the repository and anyone else can download them as part of your 'fork' of the repository.
Note that if you do this, you should ensure you `pull` regularly from the original `ncov` repository to keep it up-to-date.

Alternatively, you can create a new repository just to hold your 'profile' files, outside of the `ncov` repository.
You can then share this repository with others, and it's very simple to keep `ncov` up to date, as you don't change it at all.
If doing this, it can be easiest to create a 'profiles' folder and imitate the structure found in the 'profiles' folder within `ncov`, but this isn't required.
Note that to run the build using the profile you'll need still run the `snakemake` command from within the `ncov` repository, but specify that the profile you want is outside that folder.

For the [`south-usa-sarscov2`](https://github.com/emmahodcroft/south-usa-sarscov2/) example, you can see the `south-central ` profile set up in a 'profiles' folder.
To run this, one would call the following from within `ncov`:

```bash
ncov$ snakemake --profile ../south-usa-sarscov2/profiles/south-central/
```
3 changes: 3 additions & 0 deletions docs/running.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,9 @@ Adding it to that file (and rerunning the Snakemake rules downstream of this) sh

We generate the colors from the `colors` rule in the Snakefile, which uses the [ordering TSV](./default_config/ordering.tsv) to generate these. See ['customizing your analysis'](customizing-analysis.md) for more info.

_*A note about locations and colors:*_
Unless you want to specifically override the colors generated, it's usually easier to _add_ information to the default `ncov` files, so that you can benefit from all the information already in those files.

#### My genomes aren't included in the analysis

There are a few steps where sequences can be removed:
Expand Down