Skip to content

SteamedFish6/MetaRanker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MetaRanker

A Computational Pipeline for Ranking Resistome Risk of Metagenomic Samples

Workflow:

workflow

Requirements:

MetaRanker is written in python3. Please install these packages in python environment.

  • numpy
  • pandas
  • biopython

The following tools are needed for sequence processing. Please install them and add them to $PATH, so that these commands can be called in shell.

Installation:

  1. Clone this repository to your local directory.

  2. Install python packages. Skip if you have installed these packages.

pip install numpy pandas biopython
  1. Install blastn, cd-hit, bwa/minimap2, samtools and pigz. The installation instruction can be found at links above. If you are using a Ubuntu operating system, these tools can be installed via apt (except for cd-hit):
sudo apt-get install ncbi-blast+
sudo apt-get install bwa
sudo apt-get install minimap2
sudo apt-get install samtools
sudo apt-get install pigz

If you downloaded executable binary tools or compiled tools from source, Please add them to $PATH:

export PATH=$PATH:/home/software/bwa # add executable binary to $PATH, edit in ~/.bashrc
cd /home/software/cd-hit
make openmp=yes # complie the multiple threaded version
make install # add complied tool to /usr/local/bin

Check if the tools are installed. If is installed, output should not be blank:

which blastn
which cd-hit
which bwa
which minimap2
which samtools
which pigz
  1. Change directory to where you cloned this repository to.
cd /path/of/MetaRanker #example path
  1. Run python command to check installation and see help.
python MetaRanker.py -h

Usage:

  1. Basic usage:
python MetaRanker.py -c contigs.fa -r reads.clean.fastq.gz -o output_dir -t 16
  1. For pair-ended reads:
python MetaRanker.py -c contigs.fa -r reads_1.clean.fastq.gz -R reads_2.clean.fastq.gz -o output_dir -t 16
  1. For nanopore reads:
python MetaRanker.py -c contigs.fa -r reads.nanopore.clean.fastq.gz --nanopore -o output_dir -t 16
  1. For pacbio reads:
python MetaRanker.py -c contigs.fa -r reads.pacbio.clean.fastq.gz --pacbio -o output_dir -t 16
  1. Set minimum amount (1000) and cut-off length (250) of contigs:
python MetaRanker.py -c contigs.fa -r reads.clean.fastq.gz -o output_dir -t 16 --minnum 1000 --minlen 250
  1. Keep original names of contigs. Usually names of contigs produced from assemblers contains white spaces, which blastn will cut them off. So MetaRanker renames contigs by default.
python MetaRanker.py -c contigs.fa -r reads.clean.fastq.gz -o output_dir -t 16 --no_rename_contigs
  1. Set weight of ARG, MGE, VF database to 1000, 1000, 1000:
python MetaRanker.py -c contigs.fa -r reads.clean.fastq.gz -o output_dir -t 16 --weight 1000 1000 1000
  1. Pass parameters to sequence processing tools. MetaRanker mainly use subprocess to call sequence processing commands. Some parameters used in MetaRanker can be passed, including --blast_evalue, --blast_identity, --blast_cover_len, --cdhit_identity and -t/--threads
python MetaRanker.py -c contigs.fa -r reads.clean.fastq.gz -o output_dir -t 64 --blast_evalue 1e-5 --blast_identity 0.9 --blast_cover_len 85 --cdhit_identity 0.9
  1. Force to overwrite existing output files of blastn, cd-hit, bwa, minimap2 and samtools. By default, MetaRanker will not call a sequence processing command if the output file exists.
python MetaRanker.py -c contigs.fa -r reads.clean.fastq.gz -o output_dir -t 16 --force

Note

  1. Preprocess of reads To ensure the precision of sequence alignment, quality control of raw reads is needed. We recommend fastp or trimmomatic for quality control of Illumina reads, and chopper for nanopore or pacbio reads. Also, host sequence removing can be applied if needed.
  2. Assembly of contigs We recommend megahit or metaspades for assembling Illumina reads (megahit may be faster), and flye for correcting, assembling and polishing nanopore or pacbio reads.

An example pipeline:

fastp -i sample_1.fastq.gz -o sample_1.clean.fastq.gz -I sample_2.fastq.gz -O sample_2.clean.fastq.gz -w 16
megahit -1 sample_1.clean.fastq.gz -2 sample_2.clean.fastq.gz -o contig_dir --out-prefix sample -t 16
python MetaRanker.py -c contig_dir/sample.contigs.fa -r sample_1.clean.fastq.gz -R sample_2.clean.fastq.gz -o metaranker_output -t 16

Outputs:

In output directory, result files should be:

  1. RiskVector, RiskModule, CoocurScore, RiskIndex, ReadsNum, BasesNum, ContigsNum of sample

metaranker_output/risk_result/RiskStat_sample.tsv

  1. Co-occurrence matrix of sample

metaranker_output/risk_result/RiskMatrix_sample.csv

  1. BLAST results in M8 format

metaranker_output/output_M8/*.tsv

  1. Filtered, annotated M8 tables

metaranker_output/preprocessed_M8/*.tsv

  1. Filtered, annotated M8 tables with categorized genes

metaranker_output/preprocessed_M8/categorized_M8/*.tsv

  1. Sequences and depths of Risk Elements

metaranker_output/risk_elements/*

  1. BPM abundance of Risk Elements

metaranker_output/BPM/*.tsv

  1. RPM abundance of Risk Elements

metaranker_output/RPM/*.tsv

  1. High risk sequences with co-ocurrence structures dumped from contigs (Visualization tool can be found at )

metaranker_output/coocur_structures/*.fasta

  1. Contigs renamed by MetaRanker (can be removed as needed)

metaranker_output/temp/*.fa

Tip

Analysis and Visualization tools can be found at MetaRanker-utils.

The co-ocurrence structures of each samples can be visualized with /CoocurStructure/CooccurStructureVisualizer.py in MetaRanker-utils.

After concatenating RiskStat_*.tsv of samples, a 3D hazard space plot can be produced with /PlottingScripts/fig2/ThreeDHazardSpace.py in MetaRanker-utils.

Publications

This project was not published yet —— but you can still have a try on your metagenomic sequencing data.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A Computational Pipeline for Ranking Resistome Risk of Metagenomic Samples

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages