Skip to content

Hood-BIFX/qcdrift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

qcdrift: Automated QC Drift Correction for Metabolomics

The qcdrift package is an R-based analytical framework designed to automate the detection, correction, and validation of instrumental drift in high-dimensional mass spectrometry data. It replaces labor-intensice manual spreadsheet workflows with a reproducible, modular pipeline.

Quick Start

If you have never used R or RStudio before, follow these steps to get up and running.

Preparation (one-time-setup)

Before running the analysis, you must install the tools R needs to create your graphs.

  1. Open RStudio
  2. Copy and paste the following line into the Console (the bottom-left window) and press Enter:
install.packages(c("devtools", "dplyr", "ggplot2", "patchwork", "ggrepel", "GGally","readxl","tidyr"))

Installation

qcdrift can be installed direclty from GitHub using devtools:

devtools::install_github('Hood-BIFX/qcdrift')

User Guide & Functional Walkthrough

The pipeline is engineered with a modular architecture, allowing users to call individual functions for specific tasks or use the process_runs() for a complete end-to-end analysis.

Automated Workflow

Once the package is installed and loaded, you can run the entire pipeline with a single command. For this example, we’ll use the provided example dataset located at inst/extdata/RawMassSpec.xlsx.

library(qcdrift)

results <- system.file('extdata/RawMassSpec.xlsx', package = 'qcdrift') |>
  process_runs()

The resulting results object is a comprehensive list containing the cleaned and corrected data, as well as all generated diagnostic plots. A multi-page PDF report can also be saved in your working directory, summarizing the key findings as follows:

# We seem to have lost this at some point, so this is a placeholder for the final report rendering function.
render_report(results, output_file = "Final_QC_Report.pdf")

Individual Functional Modules

The pipeline is built on modular functions that can be called independently. We will walk through the key functions in this section, using the example dataset to illustrate their use and outputs.

Data Import and Cleaning

read_and_clean_data() standardizes the raw Excel into a “tidy” long-format data frame.

  • Automated Parsing: It extracts sample names and numerical injection orders from the file header.
  • QC Identification: It uses regular expression to classify samples based on the qc_starts_with parameter, which defaults to “QC”.
raw_data <- system.file('extdata/RawMassSpec.xlsx', package = 'qcdrift') |>
  read_and_clean_data()

Drift Correction

qc_drift_correction() applies the corret_linear() function (piecewise linear interpolation between QC samples) to correct for QC drift (other options may be added in the future). The function expects the input data frame to have the following columns, which are automatically generated by read_and_clean_data():

  • io: The injection order (numerical sequence of samples)
  • abundance: The raw intensity values for each metabolite
  • qc: A binary vector indicating which samples are QCs (1 for QC, 0 for non-QC)
  • metabolites: The name or ID of the molecule being measured
corrected_data <- qc_drift_correction(raw_data)

Normalization

normalize_data() applies Total Sum Normalization (TSN) to account for injection-level variability, such as differences in sample volume or dilution. Alternately, specifying method = 'auto' will applly autoscaling (normalization to mean=0 and sd=1) if desired.

normalized_data <- normalize_data(corrected_data)

Visualization

The package includes several ggplot2- based functions to validate data integrity.

Principal Component Analysis: generate_pca_plot()

  • Function: Visualizes the global variance structure
  • Interpretation: Tight clustering of QC sample (blue diamonds) in the “Corrected” plot indicates successful removal of technical noise.

Precision Tracking generate_cv_plot()

  • Function: Creates a waterfall plot of Coefficient of Variation (CV) percentages
  • Interpretation: A horizontal dashed line at 20% marks the acceptable industry threshold for reproducibility.

Distribution Analysis: generate_qc_violin()

  • Function: Display log10-transformed abundance distributions across the run.
  • Interpretation: Consistent median values and interquartile range across samples indicate successful Total Sum Normalization (TSN).

Heatmap Visualization: generate_qc_heatmap()

  • Function: Displays Z-score normalized intensities across injection orders.
  • Interpretation: The removal of “stripping” (verrtical color gradients) confrims that temporal decay has been eliminated.

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages