From 2f5cbbdd96b92ac173fc410147fb2618766da46c Mon Sep 17 00:00:00 2001
From: Emma Hodcroft <emmahodcroft@gmail.com>
Date: Mon, 22 Jun 2020 13:28:02 +0200
Subject: [PATCH] update docs for tutorial

---
 docs/customizing-analysis.md | 51 ++++++++++++++++++++++++++++++++----
 docs/running.md              |  3 +++
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/docs/customizing-analysis.md b/docs/customizing-analysis.md
index 8ab16f0..6ab0d5e 100644
--- a/docs/customizing-analysis.md
+++ b/docs/customizing-analysis.md
@@ -41,16 +41,25 @@ We implement hierarchical subsampling by producing multiple samples at different
 and merge these samples into one file for further analysis.
 A build can specify any number of such samples which can be flexibly restricted to particular
 meta data fields and subsampled from groups with particular properties.
-For canton's this looks like this:
+When specifying subsampling in this way, we'll first take sequences from the 'focal' area, and the select samples from other geographical areas.
+Read further for information on how we select these samples.
+
+In this example, we'll look at a subsampling scheme which defines a `canton`.
+Cantons are regional divisions in Switzerland - below 'country,' but above 'location' (often city-level).
+Here, we'd like to be able to specify a particular 'canton' and do focal sampling there, with contextual samples from elsewhere in the country, other countries in the region, and other regions in the world.
+
+For cantons this looks like this:
 ```yaml
 subsampling:
-  # Default subsampling logic for divisions
-  canton:
+  # We are calling this type of sampling 'canton'.
+  # As a 'canton' is a division, we'll start by defining what kind of sampling we want at the 'division' level, for whatever canton we specify
+  canton: ## Build name
     # Focal samples for division (only samples from a specifed division with 300 seqs per month)
     division:
       group_by: "year month"
       seq_per_group: 300
       exclude: "--exclude-where 'region!={{region}}' 'country!={{country}}' 'division!={{division}}'"
+    # Now we'll specify the types of 'contextual' samples we want:
     # Contextual samples from division's country
     country:
       group_by: "division year month"
@@ -77,11 +86,11 @@ subsampling:
         type: "proximity"
         focus: "division"
 ```
-All entries above canton level specify priorities. Currently, we have only implemented
+All entries above canton level (the 'contextual' samples) specify priorities. Currently, we have only implemented
 one type of priority called `proximity`.
 It attempts to selected sequences as close as possible to the focal samples
 specified as `focus: division`.
-The argument of the latter has to match the name of one of the other subsamples.
+The argument of the latter has to match the name of one of the other subsamples. 
 
 If you need parameters in a way that isn't represented by the configuration file, [create a new issue in the ncov repository](https://github.com/nextstrain/ncov/issues/new) to let us know.
 
@@ -123,3 +132,35 @@ These are specified in `default_config/clades.tsv` like so:
 A1a	ORF3a	251	V
 A1a	ORF1a	3606	F
 ```  
+
+
+## Keeping a 'Location Build' Up-To-Date
+
+If you are aiming to create a public health build for a state, division, or area of interest, you likely want to keep your analysis up-to-date easily.
+If your run contains contextual subsampling (sequences from outside of your focal area), you should first ensure that you regularly download the latest sequences as input, then re-run the build.
+This way, you always have a build that reflects the most recent SARS-CoV-2 information.
+
+You should also aim to keep the `ncov` repository updated.
+If you've clone the repository from Github, this is done by running `git pull`.
+This downloads any changes that we have made to the repoistory to your own computer.
+In particular, we add new colors and latitute & longitude information regularly - these should match the new sequences you download, so that you don't need to add this information yourself.
+
+If you don't need to share your build 'profile' with anyone, then it's simple to leave this in the `/profiles` folder.
+It won't be changed when you `git pull` for the latest information.
+
+However, if you want to share your build profile, you'll need to adopt one of the following solutions.
+First, you can 'fork' the entire `ncov` repository, which means you have your own copy of the repository. 
+You can then add your profile files to the repository and anyone else can download them as part of your 'fork' of the repository.
+Note that if you do this, you should ensure you `pull` regularly from the original `ncov` repository to keep it up-to-date.
+
+Alternatively, you can create a new repository just to hold your 'profile' files, outside of the `ncov` repository.
+You can then share this repository with others, and it's very simple to keep `ncov` up to date, as you don't change it at all.
+If doing this, it can be easiest to create a 'profiles' folder and imitate the structure found in the 'profiles' folder within `ncov`, but this isn't required.
+Note that to run the build using the profile you'll need still run the `snakemake` command from within the `ncov` repository, but specify that the profile you want is outside that folder.
+
+For the [`south-usa-sarscov2`](https://github.com/emmahodcroft/south-usa-sarscov2/) example, you can see the `south-central ` profile set up in a 'profiles' folder.
+To run this, one would call the following from within `ncov`:
+
+```bash
+ncov$ snakemake --profile ../south-usa-sarscov2/profiles/south-central/
+```
diff --git a/docs/running.md b/docs/running.md
index 23a2a91..6cc9908 100644
--- a/docs/running.md
+++ b/docs/running.md
@@ -88,6 +88,9 @@ Adding it to that file (and rerunning the Snakemake rules downstream of this) sh
 
 We generate the colors from the `colors` rule in the Snakefile, which uses the [ordering TSV](./default_config/ordering.tsv) to generate these. See ['customizing your analysis'](customizing-analysis.md) for more info.
 
+_*A note about locations and colors:*_
+Unless you want to specifically override the colors generated, it's usually easier to _add_ information to the default `ncov` files, so that you can benefit from all the information already in those files.
+
 #### My genomes aren't included in the analysis
 
 There are a few steps where sequences can be removed: