🧬 SAMap Pipeline - Cross-species transcriptome alignment using BLAST and SAMap

This project wraps SAMap (https://github.com/atarashansky/SAMap) in a NextFlow pipeline.

Current version: v1.1.0

🚀 Quickstart

This project uses Makefile to simplify many of the necessary actions. Each step can be done manually or with a make target.

I will soon add a make target to clone the example data from SAMap and format it correctly.

1. Build the custom docker images

docker build -f Dockerfile.samap -t pipeline/samap:latest .
docker build -f Dockerfile.blast -t pipeline/samap-blast:latest .

-or-

make docker

2. Run the pipeline

nextflow run main.nf --with-docker

-or-

make run

📂 Input Files

The pipeline expects the following input files to be present:

sample_sheet.csv
*.fasta
*.h5ad

An example tree:

sample_sheet.csv
data/
├── transcriptomes/
│   ├── hydra.fasta
│   ├── planarian.fasta
│   └── schistosome.fasta
├── hydra.h5ad
├── planarian.h5ad
└── schistosome.h5ad

📜 Sample Sheet Format

The sample sheet dictates metadata about each sample. Samples will not be put through the pipeline unless they are present and correctly described in the sample sheet. An example sample_sheet.csv might look like:

id,h5ad,fasta,annotation
00,data/planarian.h5ad,data/transcriptomes/planarian_transcriptome.fasta,cluster
01,data/hydra_mod.h5ad,data/transcriptomes/hydra_transcriptome.fasta,Cluster
02,data/schistosome.h5ad,data/transcriptomes/schistosome_proteome.fasta,tissue

⚙️ Parameters

Parameter	Requirement	Description	Default
`run_id`	Optional	Custom run ID	`null`
`sample_sheet`	Optional	Sample sheet describing sample metadata	`'sample_sheet.csv'`
`data_dir`	Optional	Path to directory containing sample data	`'data'`
`maps_dir`	Optional	Path to directory of precomputed BLAST maps	`null`
`results_dir`	Optional	Path to directory where results are stored	`'results'`

🏁 Output Files

Results are stored in results/{run_id}/.

Path	Description
{run_id}_sample_sheet.csv	Processed sample sheet
csv/hms.csv	Highest mapping scores
csv/pms.csv	Pairwise mapping scores
plots/chord.html	Chord plot
plots/sankey.html	Sankey plot
plots/scatter.png	Scatterplot
samap_objects/samap_results.pkl	Pickled SAMAP object after running SAMap
samap_objects/samap.pkl	Pickled SAMAP object before running SAMap
sams/*	Pickled SAM objects named according to the 2-char hash assigned to their sample
logs/*	Logfile output for each module

🧱 Module Overview

1. PREPROCESS

Reads the sample_sheet.csv, classifies transcriptomes based on input FASTA files, and assigns unique two-character IDs. Outputs an enriched sample sheet with metadata used downstream.

2. RUN_BLAST_PAIR

For each unique unordered species pair, performs a reciprocal BLAST to generate mapping files. Skipped if --use_precomputed_blast is true.

3. LOAD_SAMS

Loads input .h5ad files and constructs SAM objects required for SAMap. Outputs pickled SAM objects.

4. BUILD_SAMAP

Combines the SAM objects and reciprocal BLAST maps to build a SAMAP object.

5. RUN_SAMAP

Runs the SAMap algorithm on the built object to calculate pairwise gene mapping scores.

6. VISUALIZE_SAMAP

Generates outputs such as Sankey diagrams, scatter plots, and CSV summaries of the alignment results for downstream analysis or interpretation.

🔗 Links and Acknowledgements

SAMap Repository: https://github.com/atarashansky/SAMap
SAMap Paper: https://pmc.ncbi.nlm.nih.gov/articles/PMC8139856/
SAMap Docker Image: https://hub.docker.com/r/avianalter/samap
BLAST Docker Image: https://hub.docker.com/r/staphb/blast

👤 Authors and Licenses

Ryan Sonderman

GitHub: @RyanSonder

Riley Grindle

GitHub: @Riley-Grindle

This pipeline is licensed under the MIT License. See the LICENSE file for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
modules		modules
patches		patches
scripts		scripts
.gitignore		.gitignore
Dockerfile.blast		Dockerfile.blast
Dockerfile.samap		Dockerfile.samap
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
sample_sheet.csv		sample_sheet.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 SAMap Pipeline - Cross-species transcriptome alignment using BLAST and SAMap

🚀 Quickstart

1. Build the custom docker images

2. Run the pipeline

📂 Input Files

📜 Sample Sheet Format

⚙️ Parameters

🏁 Output Files

🧱 Module Overview

1. PREPROCESS

2. RUN_BLAST_PAIR

3. LOAD_SAMS

4. BUILD_SAMAP

5. RUN_SAMAP

6. VISUALIZE_SAMAP

🔗 Links and Acknowledgements

👤 Authors and Licenses

📋 To Do List

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

RyanSonder/nf-samap

Folders and files

Latest commit

History

Repository files navigation

🧬 SAMap Pipeline - Cross-species transcriptome alignment using BLAST and SAMap

🚀 Quickstart

1. Build the custom docker images

2. Run the pipeline

📂 Input Files

📜 Sample Sheet Format

⚙️ Parameters

🏁 Output Files

🧱 Module Overview

1. PREPROCESS

2. RUN_BLAST_PAIR

3. LOAD_SAMS

4. BUILD_SAMAP

5. RUN_SAMAP

6. VISUALIZE_SAMAP

🔗 Links and Acknowledgements

👤 Authors and Licenses

📋 To Do List

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages