Skip to content

Kevin-Brockers/nextflow-pipeline-snippet

Repository files navigation

dimeloseq

General

  • Disclaimer: This is only a small public snippet of a larger pipeline

  • This pipeline processes DiMeLo-Seq ONT data, see here.

    • In brief, DiMeLo-Seq is a method that uses site directed m6A DNA methylation to study protein-chromatin binding events.

Default configuration

  • The default configuration uses a mouse T2T reference genome, to fully benefit from long-read sequencing and to study highly repetitive / structural repeats in the genome.

Initial set up

  • The Nextflow-Apptainer conda environment, which has been used to develop the pipeline, can be found here: ./nextflow_env.yml.
  • BEFORE the first run:
    • Download a reference genome.
      • Run the ./get_ref_genome.sh script. This will download the mouse T2T genome, C57BL/6J haplotype.
      • To loosely track the genome version, the download date is added to the name, please modify the reference_genome_path accordingly, see below.
    • The pipeline can be executed running the ./run_workflow.sh script.
      • It defines some global variables, please modify them accordingly.
        • Those are singularity/apptainer temp and cache directories.

Parameters

-> All params listed below are required!

  • sample_manifest: path to sample manifest
  • outdir: path to result directory
  • reference_genome_path: path to reference genome, .fasta
  • base_model_complex: base model name
  • mod_model_complex: modification model name, pipeline handles combinations as well
  • min_modbase_coverage: minimal coverage for a modification site to be considered

Sample manifest

  • Flowcell: Name of the flowcell (required)
  • Barcode: Barcode of the sample - this pipeline assumes multiplexed samples per flowcell (required)
  • Target: DiMeLo-Seq target, e.g CTCF or similar (required)
  • Condition: Experimental condition, e.g WT, DMSO or drug-treated (required)
  • Tech_rep: In case same libraries were sequenced multiple times (required)
  • Bio_rep: Biological replicate (required)
  • Pairing: Is used to compute corrected m6A enrichments, e.g log2FC of (on-target/off-target) or log2FC(target-AB / IGG control) (required)
  • Sequencer: Information of sequencer, e.g MinION (optional)
  • Sequencing_mode: Information of sequencing mode, e.g 400bps_5kHz (optional)
  • Library_prep_kit: Manufacturer library prep kit name, e.g 'SQK-NBD114-24' (required)
  • FLowcell_chemistry: Used flowcell chemistry, e.g 'V14' (optional)
  • Pod5_path: File path to the flowcell pod5 directory (required)

Profiles

  • The pipeline comes with several profiles:
    • slurm_cpu
      • Default profile for SLURM cluster execution.
      • Dorado basecalling will still be submitted to the GPU queue, since the modbase calling on CPUs will take forever on LINUX machines and break on Apple ARM chips.
    • slurm_gpu
      • Submits all jobs to the GPU queue on a SLURM cluster.
    • singularity
      • Enable singularity / apptainer execution
    • test_gpu
      • used for developing the pipeline on a small subset, including (mod)basecalling on GPUs.

-> Before you run the pipeline on a SLURM cluster, make sure queue and qos names are set properly.

Current limitations

nf-test

  • Currently nf-test is not implemented for the dorado processes and the whole pipeline but will be added in the future.

Type checking - new in 25.10

  • Will be added in the future.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors