BWA is a common life sciences task of performing DNA read alignment.
This repository contains a parallel implementation of BWA, orchestrated with the Abstract Function Choreography Language and runnable on the xAFCL Enactment Engine
There are two workflows flavors, workflow and workflow-slim:
workfloware runnable, and well-tested on the current version of xAFCL.workflow-slimare aspirational workflows where the dataflow is optimized to its theoretical limit, but are not tested on the current version of xAFCL.
Fig 1: workflow-slim.yaml control and data flow
git clone https://github.com/Apollo-Workflows/BWA
cd BWA
| Name | Description | S3 Bucket | Keys |
|---|---|---|---|
| Escherichia Coli | A gram-negative bacterium that can cause food poisoning. The Assembly used is ASM584v2, with a double mutation in gene hipA. |
jak-bwa-bucket |
input/NC_000913.3-hipA7.fastainput/reads/hipa7_reads_R1.fastqinput/reads/hipa7_reads_R2.fastq |
| Trypanosoma brucei | A single-cell organism that causes sleeping sickness in humans. The Assembly used is ASM244v1. | jak-bwa-bucket |
t-brucei/ASM244v1.fastat-brucei/reads/asm_reads_R1.fastqt-brucei/reads/asm_reads_R2.fastq |
| Rhizobium jaguaris | A nitrogen-fixing soil bacterium isolated in Mexico. The Assembly used is ASM362775v1. | jak-bwa-bucket |
rhi-jaguaris/rhizobium-jaguaris.fastarhi-jaguaris/rhi_jaguaris_reads_R1.fastqrhi-jaguaris/rhi_jaguaris_reads_R2.fastq |
| Bacteroides thetaiotaomicron | An anaerobic bacterium very common in the gut of humans and other mammals. The Assembly used is ASM1413175v1. | jak-bwa-bucket |
bac-thet/bac_thetaiotamicron.fastabac-thet/bac_thetaiotamicron_reads_R1.fastqbac-thet/bac_thetaiotamicron_reads_R2.fastq |
Download and put any three files in an S3 bucket of yours, ideally in the same region as the Lambdas will be in.
Update input.json with the bucket and the keys of your DNA samples, and the desired parallelism:
{
"s3bucket": "YOUR_BUCKET",
"files": {
"reference": "YOUR_KEY_OF_REFERENCE_GENOME.fasta",
"r1": "YOUR_KEY_OF_reads_R1.fastq",
"r2": "YOUR_KEY_OF_reads_R2.fastq"
},
"numSplits": 3
}
The Lambdas are in functions.
You can run npx deply if you don't want to deploy them by hand. Just update deploy.json beforehand.
Alternatively, deploy them by hand to Amazon.
Open workflow.yaml, and update the resource fields to the ARNs of your deployed Lambdas. You can find the ARNs in your AWS Lambda Console.
...
properties:
- name: "resource"
value: "arn:aws:lambda:XXXXXXXXXXXXXXXXXXXXXX:bwa-index"
...Then, you can run the workflow:
$ java -jar YOUR_PATH_TO_xAFCL.jar ./workflow.yaml ./input.json
Measurements were not done in a controlled test environment. Use for personal reference only.
Assemblies: National Center for Biotechnology Information. Please consult the table above for the exact assemblies used.
