qcdrift: Automated QC Drift Correction for Metabolomics

The qcdrift package is an R-based analytical framework designed to automate the detection, correction, and validation of instrumental drift in high-dimensional mass spectrometry data. It replaces labor-intensice manual spreadsheet workflows with a reproducible, modular pipeline.

Quick Start

If you have never used R or RStudio before, follow these steps to get up and running.

Preparation (one-time-setup)

Before running the analysis, you must install the tools R needs to create your graphs.

Open RStudio
Copy and paste the following line into the Console (the bottom-left window) and press Enter:

install.packages(c("devtools", "dplyr", "ggplot2", "patchwork", "ggrepel", "GGally","readxl","tidyr"))

Installation

qcdrift can be installed direclty from GitHub using devtools:

devtools::install_github('Hood-BIFX/qcdrift')

User Guide & Functional Walkthrough

The pipeline is engineered with a modular architecture, allowing users to call individual functions for specific tasks or use the process_runs() for a complete end-to-end analysis.

Automated Workflow

Once the package is installed and loaded, you can run the entire pipeline with a single command. For this example, we’ll use the provided example dataset located at inst/extdata/RawMassSpec.xlsx.

library(qcdrift)

results <- system.file('extdata/RawMassSpec.xlsx', package = 'qcdrift') |>
  process_runs()

The resulting results object is a comprehensive list containing the cleaned and corrected data, as well as all generated diagnostic plots. A multi-page PDF report can also be saved in your working directory, summarizing the key findings as follows:

# We seem to have lost this at some point, so this is a placeholder for the final report rendering function.
render_report(results, output_file = "Final_QC_Report.pdf")

Individual Functional Modules

The pipeline is built on modular functions that can be called independently. We will walk through the key functions in this section, using the example dataset to illustrate their use and outputs.

Data Import and Cleaning

read_and_clean_data() standardizes the raw Excel into a “tidy” long-format data frame.

Automated Parsing: It extracts sample names and numerical injection orders from the file header.
QC Identification: It uses regular expression to classify samples based on the qc_starts_with parameter, which defaults to “QC”.

raw_data <- system.file('extdata/RawMassSpec.xlsx', package = 'qcdrift') |>
  read_and_clean_data()

Drift Correction

qc_drift_correction() applies the corret_linear() function (piecewise linear interpolation between QC samples) to correct for QC drift (other options may be added in the future). The function expects the input data frame to have the following columns, which are automatically generated by read_and_clean_data():

io: The injection order (numerical sequence of samples)
abundance: The raw intensity values for each metabolite
qc: A binary vector indicating which samples are QCs (1 for QC, 0 for non-QC)
metabolites: The name or ID of the molecule being measured

corrected_data <- qc_drift_correction(raw_data)

Normalization

normalize_data() applies Total Sum Normalization (TSN) to account for injection-level variability, such as differences in sample volume or dilution. Alternately, specifying method = 'auto' will applly autoscaling (normalization to mean=0 and sd=1) if desired.

normalized_data <- normalize_data(corrected_data)

Visualization

The package includes several ggplot2- based functions to validate data integrity.

Principal Component Analysis: `generate_pca_plot()`

Function: Visualizes the global variance structure
Interpretation: Tight clustering of QC sample (blue diamonds) in the “Corrected” plot indicates successful removal of technical noise.

Precision Tracking `generate_cv_plot()`

Function: Creates a waterfall plot of Coefficient of Variation (CV) percentages
Interpretation: A horizontal dashed line at 20% marks the acceptable industry threshold for reproducibility.

Distribution Analysis: `generate_qc_violin()`

Function: Display log10-transformed abundance distributions across the run.
Interpretation: Consistent median values and interquartile range across samples indicate successful Total Sum Normalization (TSN).

Heatmap Visualization: `generate_qc_heatmap()`

Function: Displays Z-score normalized intensities across injection orders.
Interpretation: The removal of “stripping” (verrtical color gradients) confrims that temporal decay has been eliminated.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
R		R
inst/extdata		inst/extdata
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
README.qmd		README.qmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

qcdrift: Automated QC Drift Correction for Metabolomics

Quick Start

Preparation (one-time-setup)

Installation

User Guide & Functional Walkthrough

Automated Workflow

Individual Functional Modules

Data Import and Cleaning

Drift Correction

Normalization

Visualization

Principal Component Analysis: `generate_pca_plot()`

Precision Tracking `generate_cv_plot()`

Distribution Analysis: `generate_qc_violin()`

Heatmap Visualization: `generate_qc_heatmap()`

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

qcdrift: Automated QC Drift Correction for Metabolomics

Quick Start

Preparation (one-time-setup)

Installation

User Guide & Functional Walkthrough

Automated Workflow

Individual Functional Modules

Data Import and Cleaning

Drift Correction

Normalization

Visualization

Principal Component Analysis: generate_pca_plot()

Precision Tracking generate_cv_plot()

Distribution Analysis: generate_qc_violin()

Heatmap Visualization: generate_qc_heatmap()

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Principal Component Analysis: `generate_pca_plot()`

Precision Tracking `generate_cv_plot()`

Distribution Analysis: `generate_qc_violin()`

Heatmap Visualization: `generate_qc_heatmap()`

Packages