Snakemake pipeline for metagenomic classification with krakenuniq. It runs
per-sample classification against a Krakenuniq database and produces compressed
classification tables plus plain-text reports.
workflow/Snakefile- main Snakemake workflowworkflow/rules/krakenuniq.smk- Krakenuniq rulesdataset_example/config/config.yml- example dataset configurationdataset_example/config/units.tsv- example dataset sample sheetrun_dataset.sh- run locallyrun_dataset_slurm.sh- submit to SLURM
snakemake(9+ recommended)krakenuniq- Python with
pandas
Create a dataset directory with a config/ folder that contains:
config/config.ymlconfig/units.tsv
An example dataset lives at dataset_example/.
Example config/config.yml (pairs with the units.tsv example below):
prefix: demo
out_dir: results
units: config/units.tsv
krakenuniq:
db: /path/to/krakenuniq/db
threads: 8config/units.tsv columns:
unit_id: sample identifier used for output sub-folderunit_prefix: library/run identifier used in output filenamesfq: path to input FASTQ file
Example:
| unit_id | unit_prefix | fq |
|---|---|---|
| SAMPLE01 | SAMPLE01_L001 | /data/fastq/SAMPLE01_L001.fq.gz |
| SAMPLE01 | SAMPLE01_L002 | /data/fastq/SAMPLE01_L002.fq.gz |
| SAMPLE02 | SAMPLE02_L001 | /data/fastq/SAMPLE02_L001.fq.gz |
From the repo root:
bash run_dataset.sh /path/to/dataset
bash run_dataset.sh dataset_exampleYou can pass extra Snakemake args after the dataset path:
bash run_dataset.sh /path/to/dataset -- --cores 8 --rerun-incompletebash run_dataset_slurm.sh /path/to/dataset --jobs 50 --partition general
bash run_dataset_slurm.sh dataset_example --dry-runSee full options:
bash run_dataset_slurm.sh --helpFor each unit_id and unit_prefix, outputs are written under ${out_dir}/{unit_id}/:
classify/- compressed Krakenuniq classification tablesreport/- Krakenuniq report tables (plus an aggregated report per unit)log/- Krakenuniq logsstages/- completion stamps created by the top-level target
Example output structure:
${out_dir}/
├── SAMPLE01/
│ ├── classify/
│ │ ├── SAMPLE01_L001.demo.krakenuniq_class.tsv.gz
│ │ └── SAMPLE01_L002.demo.krakenuniq_class.tsv.gz
│ ├── report/
│ │ ├── SAMPLE01_L001.demo.krakenuniq_report.tsv
│ │ ├── SAMPLE01_L002.demo.krakenuniq_report.tsv
│ │ └── SAMPLE01.demo.krakenuniq_report_aggregated.tsv
│ ├── log/
│ │ ├── SAMPLE01_L001.demo.classify.log
│ │ └── SAMPLE01_L002.demo.classify.log
│ └── stages/
│ ├── SAMPLE01_L001.all.done
│ └── SAMPLE01_L002.all.done
└── SAMPLE02/
├── classify/
│ └── SAMPLE02_L001.demo.krakenuniq_class.tsv.gz
├── report/
│ ├── SAMPLE02_L001.demo.krakenuniq_report.tsv
│ └── SAMPLE02.demo.krakenuniq_report_aggregated.tsv
├── log/
│ └── SAMPLE02_L001.demo.classify.log
└── stages/
└── SAMPLE02_L001.all.done