Deconvoluting metagenomic assemblies via Hi-C connect network
- Docker
- 16GB memory recommended
- 40GB memory recommended if performing CheckM
We use Docker to build an environment for the process.
git clone https://github.com/changlabtw/Bin3C_SLM.git
cd Bin3C_SLM
# build the image
docker build -t Bin3C_SLM . --no-cache
# run docker container, use volume to get data from host machine
docker run -it -d -v <path of data from host>:/home/vol --name Bin3C_SLM0 Bin3C_SLM
# get a bash shell in the container
docker exec -it Bin3C_SLM0 sh Bin3C_SLM is based on bin3C with addtional homemade functions to perform specific clustering and evaluation.
- mzd/cluster.py: replace the original cluster.py of bin3C with two additional functions, getGraph() and getSLMresult()
- getGraph(): convert the seq_map to an undirected Networkx Graph using
to_Graph()in original cluster.py of bin3C, and generate the edge file bywrite_edgelistfunction from Networkx package. - getSLMresult(): combining
_read_table()and part ofcluster_map()function in original cluster.py of bin3C to get the sequence indices of every cluster.
- getGraph(): convert the seq_map to an undirected Networkx Graph using
- map2g.py: get an edge file from a contact map file using getGraph() in mzd/cluster.py
- SLM2seq.py: generate the sequence fasta of each bin using getSLMresult() in mzd/cluster.py
- ezcheck-full.py: evaluate the ranks (near, substantial, moderate) of the checkM result file, bin_stats_ext.tsv
Docker
- Dockerfile: for building docker image.
- requirements.txt/requirementspy3.txt: for installing required python packages during building docker image.
Tool
- bin3c_slm.sh: a wrap-up script to run the whole process of Bin3C_SLM include CheckM.
The original dataset derives from a human fecal sample and contains a shotgun read-set (SRR6131123), and two separated Hi-C read-sets produced using two restriction enzymes MluCI (SRR6131122) and Sau3AI (SRR6131123). The following example data is generated after initial process.
- scaffolds.fasta: shotgun reads are cleaned up by BBDuk in BBTools, and assembled using metaSPAdes.
- merged_scaf.bam: merged from two bam files mapped by MluCI and Sau3AI Hi-Cs.
There are two ways to run Bin3C_SLM: one command or step-by-step.
- We supply a simple script to run the whole process include metagenome deconvolution and result evaluation.
# bin3c_slm.sh <input:assembled fasta> <input:Hi-C bam file> <output:path> <slm resolution=25.0>
bin3c_slm.sh /home/vol/data/scaffolds.fasta /home/vol/data/merged_scaf.bam /home/vol/output 25.0- Step-by-step.
- generate contact map
/home/bin3C/bin3C.py mkmap -e MluCI -e Sau3AI <input:assembled fasta> <input:Hi-C bam file> <output:path>
- generate connect network
/home/bin3C/map2g.py -i <input:contact map> -o <output:path>
- genome binning
java -jar /home/bin3C/external/ModularityOptimizer.jar <input:connect network> <output:path/result.txt> 1 25.0 3 10 10 9001882 1
- get fasta seqs based on binning
/home/bin3C/SLM2seq.py <input:slm result> <input:contact map> <output:path>
- perform checkm and evaluate performance
checkm lineage_wf -t 8 <input:fasta path> <output:path> python3 /home/bin3C/ezcheck-full.py -f -i <input:bin_stats_ext.tsv from chechm> -o <output:path/ezcheck_result.csv>