Downloading disambiguate reference files and alternative solutions

**About**
At the current moment, the [`cache` subcommand of the pipeline](https://openomics.github.io/weave/usage/cache/) does not download disambiguate's reference files, i.e. the bwa indices for each of the supporting reference genomes. As so, these reference files should exist on the host's filesystem prior to execution. These files have already been downloaded/exist on BigSky and Biowulf; however, if the pipeline were to be setup on another cluster, they would need to be downloaded outside the cache subcommand.

Here is an example command to download disambiguate's reference files from helix/biowulf:
```bash
rsync -rav -e ssh helix.nih.gov:/data/OpenOmics/references/genomes .
```

**Road map**
Here are some proposed long-term solutions:
  1. Move the reference files into our data-share directory for easy downloads, update the cache sub command to pull from this location.
  2.  Build the alignment indices on the fly in the output directory and blow them away as a post-processing hook. This should not be a rate-limiting step of the pipeline. It can start running during the bcl2fastq conversion and should be completed way before trimming completes. The only down-side is a slight increase in disk space while the pipeline is running; although if the pipeline cleans up these files after the run completes, it's not really a big deal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downloading disambiguate reference files and alternative solutions #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Downloading disambiguate reference files and alternative solutions #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions