This repository is associated with the following manuscript: [TO DO]
A schematic of the pipeline is shown below. We used QIIME2 to download and process amplicon sequencing data up to taxonomic profiles at genus level, and performed downstream analysis in Python (and R). Processed datasets used in our analysis are available in the datasets/ folder.
To install the required environments and packages, you can run:
conda env create -f general.yml && conda env create -f qiime2-amplicon-2024.10.ymlWe use QIIME2 with the q2-fondue plugin to process amplicon datasets from NCBI on a Slurm cluster. Inputs can be an accession or an existing study directory.
If a study directory with an existing accession.tsv file already exists:
cd bash_scripts
./pipeline.sh --study_id study_nameIf a study directory does not exist, you can provide a study accession and a study name, which will create a study fodler with an accession file:
./pipeline.sh --study_id study_name --accession EXAMPLE123456,EXAMPLE234567To run only selected steps of the pipeline:
./pipeline.sh --study_id study_name --accession EXAMPLE123456,EXAMPLE234567 --run_download --run_cutadaptFor the full list of pipeline steps, options and flags, run:
./pipeline.sh --helpTo reproduce the analysis and associated figures in the paper, you can run:
python3 plot_figure_[figure_number].py