Skip to content

Pipeline

Alex Paul edited this page May 7, 2018 · 2 revisions

You can download and compile the modules yourself, following the instructions provided in each modules README. We have also provided an example demonstrating the methods used from raw reads to reduced graph that can be used to generate contigs which we explain below. To get started a few things are necessary.

Requirements:

Software

From SORA Github

  • parse_dot.py
  • transitive-edge-reduction-module_2.11-1.0.jar
  • Composite Edge Contraction Project-assembly-1.1.jar
  • run_BB_SORA.sh

Other

  • Raw reads
    • An example dataset can be downloaded from the MetaSUB Forensics Challenge for CAMDA 2018 here

After installing the above software requirements and downloading the required files in a directory you're almost ready to start the analysis. Just need to edit the shell script file to update the directory locations for your data.

The first one being the location of the file(s) for your reads. The shell script is able to handle either a single interleaved file or two paired end files. If using two paired end reads update the INPUT_READ1_FILE and INPUT_READ2_FILE variables. If using a single interleaved file update the INPUT_INTERLEAVED_FILE.

The second spot being the number of NODES you would like the spark program to utilize during the program.

The third being the directory location of the bbmap tools which is stored in the BBmap variable.

The fourth is updating the installed Spark directory stored in variable SPARK. Enter the full path to the bin directory.

The last thing that should be updated is the OUT_DIR and OUT_FILE_PREFACE variables. The OUT_DIR is the location where the logs and the intermediate and final results are stored. The OUT_FILE_PREFACE variable is used to preface the file outputs in the output directory.

After doing the above initialization and setup you are ready to run the shell script.

Clone this wiki locally