BWA

BWA is a common life sciences task of performing DNA read alignment.

Overview

This repository contains a parallel implementation of BWA, orchestrated with the Abstract Function Choreography Language and runnable on the xAFCL Enactment Engine

There are two workflows flavors, workflow and workflow-slim:

workflow are runnable, and well-tested on the current version of xAFCL.
workflow-slim are aspirational workflows where the dataflow is optimized to its theoretical limit, but are not tested on the current version of xAFCL.

Fig 1: workflow-slim.yaml control and data flow

Get the code

git clone https://github.com/Apollo-Workflows/BWA
cd BWA

Get an input dataset

Name	Description	S3 Bucket	Keys
Escherichia Coli	A gram-negative bacterium that can cause food poisoning. The Assembly used is ASM584v2, with a double mutation in gene `hipA`.	`jak-bwa-bucket`	`input/NC_000913.3-hipA7.fasta` `input/reads/hipa7_reads_R1.fastq` `input/reads/hipa7_reads_R2.fastq`
Trypanosoma brucei	A single-cell organism that causes sleeping sickness in humans. The Assembly used is ASM244v1.	`jak-bwa-bucket`	`t-brucei/ASM244v1.fasta` `t-brucei/reads/asm_reads_R1.fastq` `t-brucei/reads/asm_reads_R2.fastq`
Rhizobium jaguaris	A nitrogen-fixing soil bacterium isolated in Mexico. The Assembly used is ASM362775v1.	`jak-bwa-bucket`	`rhi-jaguaris/rhizobium-jaguaris.fasta` `rhi-jaguaris/rhi_jaguaris_reads_R1.fastq` `rhi-jaguaris/rhi_jaguaris_reads_R2.fastq`
Bacteroides thetaiotaomicron	An anaerobic bacterium very common in the gut of humans and other mammals. The Assembly used is ASM1413175v1.	`jak-bwa-bucket`	`bac-thet/bac_thetaiotamicron.fasta` `bac-thet/bac_thetaiotamicron_reads_R1.fastq` `bac-thet/bac_thetaiotamicron_reads_R2.fastq`

Download and put any three files in an S3 bucket of yours, ideally in the same region as the Lambdas will be in. Update input.json with the bucket and the keys of your DNA samples, and the desired parallelism:

{
  "s3bucket": "YOUR_BUCKET",
  "files": {
    "reference": "YOUR_KEY_OF_REFERENCE_GENOME.fasta",
    "r1": "YOUR_KEY_OF_reads_R1.fastq",
    "r2": "YOUR_KEY_OF_reads_R2.fastq"
  },
  "numSplits": 3
}

Deploy the Lambdas

The Lambdas are in functions. You can run npx deply if you don't want to deploy them by hand. Just update deploy.json beforehand. Alternatively, deploy them by hand to Amazon.

Run the workflow

Open workflow.yaml, and update the resource fields to the ARNs of your deployed Lambdas. You can find the ARNs in your AWS Lambda Console.

 ...
 properties:
    - name: "resource"
      value: "arn:aws:lambda:XXXXXXXXXXXXXXXXXXXXXX:bwa-index"
 ...

Then, you can run the workflow:

$ java -jar YOUR_PATH_TO_xAFCL.jar ./workflow.yaml ./input.json

Preliminary Metrics

Measurements were not done in a controlled test environment. Use for personal reference only.

References

BWA: Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub. [PMID: 20080505]

Seqkit: W Shen, S Le, Y Li*, F Hu*. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLOS ONE. doi:10.1371/journal.pone.0163962.

Assemblies: National Center for Biotechnology Information. Please consult the table above for the exact assemblies used.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
diagrams		diagrams
functions		functions
metrics		metrics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deploy.json		deploy.json
input-slim.json		input-slim.json
input.json		input.json
package-lock.json		package-lock.json
package.json		package.json
seqkit		seqkit
utils.js		utils.js
verify.sh		verify.sh
workflow-markup.xml		workflow-markup.xml
workflow-slim.xml		workflow-slim.xml
workflow-slim.yaml		workflow-slim.yaml
workflow.yaml		workflow.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BWA

Overview

Get the code

Get an input dataset

Deploy the Lambdas

Run the workflow

Preliminary Metrics

References

About

Uh oh!

Releases

Packages

Languages

License

Apollo-Workflows/BWA

Folders and files

Latest commit

History

Repository files navigation

BWA

Overview

Get the code

Get an input dataset

Deploy the Lambdas

Run the workflow

Preliminary Metrics

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages