-
Disclaimer: This is only a small public snippet of a larger pipeline -
This pipeline processes DiMeLo-Seq ONT data, see here.
- In brief, DiMeLo-Seq is a method that uses site directed m6A DNA methylation to study protein-chromatin binding events.
- The default configuration uses a mouse T2T reference genome, to fully benefit from long-read sequencing and to study highly repetitive / structural repeats in the genome.
- The Nextflow-Apptainer conda environment, which has been used to develop the pipeline, can be found here:
./nextflow_env.yml. BEFOREthe first run:- Download a reference genome.
- Run the
./get_ref_genome.shscript. This will download the mouse T2T genome, C57BL/6J haplotype. - To loosely track the genome version, the download date is added to the name, please modify the
reference_genome_pathaccordingly, see below.
- Run the
- The pipeline can be executed running the
./run_workflow.shscript.- It defines some global variables, please modify them accordingly.
- Those are singularity/apptainer temp and cache directories.
- It defines some global variables, please modify them accordingly.
- Download a reference genome.
-> All params listed below are required!
sample_manifest: path to sample manifestoutdir: path to result directoryreference_genome_path: path to reference genome, .fastabase_model_complex: base model namemod_model_complex: modification model name, pipeline handles combinations as wellmin_modbase_coverage: minimal coverage for a modification site to be considered
Flowcell: Name of the flowcell (required)Barcode: Barcode of the sample - this pipeline assumes multiplexed samples per flowcell (required)Target: DiMeLo-Seq target, e.g CTCF or similar (required)Condition: Experimental condition, e.g WT, DMSO or drug-treated (required)Tech_rep: In case same libraries were sequenced multiple times (required)Bio_rep: Biological replicate (required)Pairing: Is used to compute corrected m6A enrichments, e.g log2FC of (on-target/off-target) or log2FC(target-AB / IGG control) (required)Sequencer: Information of sequencer, e.g MinION (optional)Sequencing_mode: Information of sequencing mode, e.g 400bps_5kHz (optional)Library_prep_kit: Manufacturer library prep kit name, e.g 'SQK-NBD114-24' (required)FLowcell_chemistry: Used flowcell chemistry, e.g 'V14' (optional)Pod5_path: File path to the flowcell pod5 directory (required)
- The pipeline comes with several profiles:
slurm_cpu- Default profile for SLURM cluster execution.
- Dorado basecalling will still be submitted to the GPU queue, since the modbase calling on CPUs will take forever on LINUX machines and break on Apple ARM chips.
slurm_gpu- Submits all jobs to the GPU queue on a SLURM cluster.
singularity- Enable singularity / apptainer execution
test_gpu- used for developing the pipeline on a small subset, including (mod)basecalling on GPUs.
-> Before you run the pipeline on a SLURM cluster, make sure queue and qos names are set properly.
- Currently nf-test is not implemented for the dorado processes and the whole pipeline but will be added in the future.
- Will be added in the future.