Skip to content

Latest commit

 

History

History
99 lines (75 loc) · 5.08 KB

File metadata and controls

99 lines (75 loc) · 5.08 KB

ModDotPlot tutorial

Get a "Regular" instance: 4 cores, 8 G RAM, 30 GB Storage

Learning goals

  • Understand main ModDotPlot parameters
  • Interpret results
  • Customize plots to create publication quality figures

Where's the data?

Generating a basic plot

Once GitPod has finished downloading external dependencies and data, make sure to verify that ModDotPlot has installed correctly.

moddotplot -h

ModDotPlot can run either in static mode which produces simple image files, or interactive mode which launches an interactive web browser. For now, let's produce some static plots! (Don't worry about understanding the parameters yet, we'll go over them later)

 moddotplot static -f bga24_data/acro_short_arms/*.fa -o output/short_arms --compare --grid

This command should take ~10 minutes to run. The vast majority of this runtime is plot rendering. While its running, we'll go through the presentation explaining how ModDotPlot works and how to interpret the plots it produces.

Let’s take a look at the outputs

ll short_arms/
---------------
seq1.bed    seq2.bed    seq3.bed
seq1_FULL.pdf   seq2_FULL.pdf   seq3_FULL.pdf
seq1_FULL.png   seq2_FULL.png   seq3_FULL.png
seq1_TRI.pdf   seq2_TRI.pdf   seq3_TRI.pdf
seq1_TRI.png    seq2_TRI.png   seq3_TRI.png
seq1_HIST.pdf   seq2_HIST.pdf   seq3_HIST.pdf
seq1_HIST.png   seq2_HIST.png   seq3_HIST.png
seq1_seq2_COMPARE.bed   seq2_seq3_COMPARE.bed
seq1_seq2_COMPARE.pdf   seq2_seq3_COMPARE.pdf
seq1_seq2_COMPARE.png   seq2_seq3_COMPARE.png
seq1_seq3_COMPARE.bed
seq1_seq3_COMPARE.pdf
seq1_seq3_COMPARE.png

Some of the important outputs:

seq*_FULL*
Standard dotplot output. Will produce a pdf and a png.
seq*_TRI*
Upper triangle portion of _FULL. Used for self identity plots
seq*_HIST*
A histogram of values, colors partioned at each boundary site.
seq*.bed
Paired-end bedfile, with raw Average Nucleotide Identity values for each pairwise set of intervals
*_COMPARE*
For each pairwise combination of sequences. Only shown when multiple inputs are used with the `--compare` or `--compare-only` flag.
*_GRID*
An N x N grid of plots, with self-identity on the diagonal, and matching comparative plots orthogonal. Only produced with `--compare` and

Sequences 1, 2, and 3 represent the short arms of Chr13, Chr14, and Chr21. Can you figure out which of these sequences is Chr14? Hint: It contains a inversion in the comparative dotplot, relative to 13 and 21!

Exploring rDNA missassembly

rDNA is, arguably, one of the most difficult regions of the human genome to assemble. This is due to their highly repetitive tandem repeats often spanning megabases in length.

 moddotplot static -f bga24_data/rDNA/*.fa -o output/rdna --compare --grid

Unlike with the short arms, this command should run in under a minute! One of these haplotypes contains a missassembly. Can you figure out which one?

Using interactive mode

ModDotPlot is able to produce a hierarchy of matrices, thanks to the hierarchical sketching approach it uses. These matrices can be saved using the --save command when running ModDotPlot in interactive mode, and accessed without expensive re-computation using -l/--load. Let's open a pre-computed view of an Arabadopsis chromosome:

moddotplot interactive --load bga24_data/arabadopsis/interactive_matrices

The above command launches an application on your machine's localhost, on port 8050 (this can be changed using the --port command). To view, simply go to your web browser of choice and localhost:8050. This might not be possible if you're running ModDotPlot on an HPC environment. Fortunately for our purposes, VSCode and GitPod support automatic port forwarding! This creates an SSH tunnel between GitPod and your local machine, allowing you to view localhost:8050.

Note: Expect some latency when port forwarding! Using localhost on your own machine is the fastest way to explore ModDotPlot's interactive mode!

Playing around with Chromosome 2 of the Arabadopsis Col-Cen reference genome, we can see that we can enhance our plots resolution pretty significantly!

Comparative X Centromere

As our final dataset for the day, we want to look at the centromeres of two human Chromosome X's: One from CHM13, the other from HG002.

moddotplot interactive -f bga24_data/cenX/*.fa --quick --compare

Adding --quick prevents the creation of hierarchical matrices. While this loses out in interactivity, this is nice to quickly explore a genome & play with other paramaters.

Issues

For troubleshooting, bug reports, or general questions, please message me on the BGA24 Discord's #moddotplot-2024 channel, or anytime afterwards at alex dot sweeten at nih dot gov. Thanks for using ModDotPlot! :)