Skip to content

martinsikora/krakenuniq_workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

krakenuniq_workflow

Snakemake pipeline for metagenomic classification with krakenuniq. It runs per-sample classification against a Krakenuniq database and produces compressed classification tables plus plain-text reports.

Repository layout

  • workflow/Snakefile - main Snakemake workflow
  • workflow/rules/krakenuniq.smk - Krakenuniq rules
  • dataset_example/config/config.yml - example dataset configuration
  • dataset_example/config/units.tsv - example dataset sample sheet
  • run_dataset.sh - run locally
  • run_dataset_slurm.sh - submit to SLURM

Requirements

  • snakemake (9+ recommended)
  • krakenuniq
  • Python with pandas

Configuration

Create a dataset directory with a config/ folder that contains:

  • config/config.yml
  • config/units.tsv

An example dataset lives at dataset_example/.

Example config/config.yml (pairs with the units.tsv example below):

prefix: demo
out_dir: results
units: config/units.tsv
krakenuniq:
  db: /path/to/krakenuniq/db
  threads: 8

config/units.tsv columns:

  • unit_id: sample identifier used for output sub-folder
  • unit_prefix: library/run identifier used in output filenames
  • fq: path to input FASTQ file

Example:

unit_id unit_prefix fq
SAMPLE01 SAMPLE01_L001 /data/fastq/SAMPLE01_L001.fq.gz
SAMPLE01 SAMPLE01_L002 /data/fastq/SAMPLE01_L002.fq.gz
SAMPLE02 SAMPLE02_L001 /data/fastq/SAMPLE02_L001.fq.gz

Run locally

From the repo root:

bash run_dataset.sh /path/to/dataset
bash run_dataset.sh dataset_example

You can pass extra Snakemake args after the dataset path:

bash run_dataset.sh /path/to/dataset -- --cores 8 --rerun-incomplete

Run on SLURM

bash run_dataset_slurm.sh /path/to/dataset --jobs 50 --partition general
bash run_dataset_slurm.sh dataset_example --dry-run

See full options:

bash run_dataset_slurm.sh --help

Outputs

For each unit_id and unit_prefix, outputs are written under ${out_dir}/{unit_id}/:

  • classify/ - compressed Krakenuniq classification tables
  • report/ - Krakenuniq report tables (plus an aggregated report per unit)
  • log/ - Krakenuniq logs
  • stages/ - completion stamps created by the top-level target

Example output structure:

${out_dir}/
├── SAMPLE01/
│   ├── classify/
│   │   ├── SAMPLE01_L001.demo.krakenuniq_class.tsv.gz
│   │   └── SAMPLE01_L002.demo.krakenuniq_class.tsv.gz
│   ├── report/
│   │   ├── SAMPLE01_L001.demo.krakenuniq_report.tsv
│   │   ├── SAMPLE01_L002.demo.krakenuniq_report.tsv
│   │   └── SAMPLE01.demo.krakenuniq_report_aggregated.tsv
│   ├── log/
│   │   ├── SAMPLE01_L001.demo.classify.log
│   │   └── SAMPLE01_L002.demo.classify.log
│   └── stages/
│       ├── SAMPLE01_L001.all.done
│       └── SAMPLE01_L002.all.done
└── SAMPLE02/
    ├── classify/
    │   └── SAMPLE02_L001.demo.krakenuniq_class.tsv.gz
    ├── report/
    │   ├── SAMPLE02_L001.demo.krakenuniq_report.tsv
    │   └── SAMPLE02.demo.krakenuniq_report_aggregated.tsv
    ├── log/
    │   └── SAMPLE02_L001.demo.classify.log
    └── stages/
        └── SAMPLE02_L001.all.done

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors