Skip to content

moka-guys/Coverage_Uniformity_Report

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 

Repository files navigation

R Script and docker files sambamba_coverage_uniformity v1.2

What does this app do?

This RScript uses the calculated Sambamba coverage bed files for x samples to visualise the variability in coverage at each genomic region across all these samples. Results are available as an interactive plot, simplified pdf plot, and csv table. A Dockerfile is included to aid sharing the scripts and is used in the dx app sambamba_coverage_uniformity v1.2 (See moka-guys/dnanexus_coverage_uniformity_report)

What are typical use cases for this script?

This app can be used during the development of new diagnositic tests to identify regions which have low coverage.

What inputs are required for this app to run?

This app requires the following inputs:

  • Name of project containing the samabamba output folder coverage/raw_output/.

Optional Inputs & flags:

  • input_directory: Folder containing sambamba output. Default = /coverage/raw_output/
  • group_by: By default the app produces a plot for each unique Pan number which coverage was calculated for. If the user would like to group Pan numbers together they can provide a string in the format of VCP1=PanA,PanB;VCP2=PanX,PanY,PanZ;. Typical use cases would be where the Pan number represents a disease area and you want to group these together by the capture kit used.
  • plot_figures: Produce plots of output (PDF & HTML) Default = True
  • simple_plot_only: Use with plot_figures flag to produce only the PDF plot (Useful for large samples which may cause performance issues when producing interactive plots) Default = False
  • no_jitter: Turn off the overlayed jittered geom_points in the interactive plots leaving the data as summarised boxplots only (Showing all the data points for large data sets may cause performance issues) Default = False

How does this script work?

The R script which reads in all data from the folder coverage/raw_output/ in the project, cleans up the data, and outputs html, pdf, and csv reports.

What does this script output?

This app outputs:

  • An interactive HTML report with boxplots of every region sorted by coverage from low > high
  • A static plot showing the average for each region
  • A csv file with the region data in tabular form

What are the limitations of this app?

Large data sets containing many regions/samples may require turning off the interactive plots due to performance issues.

About

Standalone R script and Docker components used by the DNA Nexus App moka-guys/dnanexus_coverage_uniformity_report to calculate uniformity of coverage across multiple samples.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors