Skip to content

Cross-fraction XIC normalization #344

@antortjim

Description

@antortjim

Hi there!!

First of all, thanks for developing an open source tool in R that can handle such complex datasets as those recurrent in proteomics and mass spec.

I am working with the MaxLFQ dataset (paper: http://www.mcponline.org/content/13/9/2513.long) where the developers of the MaxQuant suite mixed the HeLa and E.coli proteomes in 1:1 and 1:3 ratios. One of the crucial steps in their pipeline is the delayed normalization of XICs (extracted ion currents i.e MS1 intensities of the cluster feature detected in the apex intensity extraction step) across fractions of the same sample (for a total of 24 fractions) using a Levenberg-Marquandt minimization approach. The normalization factors used to sum up the peptide intensities across fractions are then set to the values that minimise the overall proteome variance, so called in the paper H(N). It's a nice approach to the problem they formulated in the paragraph below:

A major challenge of label-free quantification with prefractionation is that separate sample processing inevitably introduces differences in the fractions to be compared. In principle, correct normalization of each fraction can eliminate this error. However, the total peptide ion signals, necessary in order to perform normalization of the LC MS/MS runs of each fraction, are spread over several adjacent runs. Therefore one cannot sum up the peptide ion signals before one knows the normalization coefficients for each fraction.

I've had a look at the MSnbase documentation, and the issue of sample fractions is mentioned in two sections:

  1. https://bioconductor.org/packages/release/bioc/vignettes/MSnbase/inst/doc/MSnbase-demo.html#2_data_structure_and_content
    regarding how to load multiple spectra files in either mzData, mzXML and mzML formats using readMSData()

  2. https://bioconductor.org/packages/release/bioc/vignettes/MSnbase/inst/doc/MSnbase-demo.html#13_combining_msnset_instances
    regarding how to combine multiple runs of the same sample i.e technical replicates if I understood correctly.

However, in none of them is this specific issue of cross-fraction XIC normalization addressed.

Is there any readily available function in the package that could read a table similar to the moFF output:

Protein(s) Sequence CondtionA_Fraction1 ConditionA_Fraction2
PXXXX Peptide 1 XIC1_A_1 XIC1_A_2
PYYYY Peptide 2 XIC2_A_1 XIC2_A_2
... ... ... ...

where rows represent peptides and columns represent

  1. the inferred precursor protein
  2. the peptide sequence
  3. and all after it: XICs for each individual run (defined by a combination of condition/treatment, replicate and fraction)

Enabling this feature would make it possible to connect moFF output to MSnbase in a straightforward way in datasets where sample fractionation across runs is present.

Thank you very much for your help beforehand

Cheers,
Antonio

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions