-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Hi there!!
First of all, thanks for developing an open source tool in R that can handle such complex datasets as those recurrent in proteomics and mass spec.
I am working with the MaxLFQ dataset (paper: http://www.mcponline.org/content/13/9/2513.long) where the developers of the MaxQuant suite mixed the HeLa and E.coli proteomes in 1:1 and 1:3 ratios. One of the crucial steps in their pipeline is the delayed normalization of XICs (extracted ion currents i.e MS1 intensities of the cluster feature detected in the apex intensity extraction step) across fractions of the same sample (for a total of 24 fractions) using a Levenberg-Marquandt minimization approach. The normalization factors used to sum up the peptide intensities across fractions are then set to the values that minimise the overall proteome variance, so called in the paper H(N). It's a nice approach to the problem they formulated in the paragraph below:
A major challenge of label-free quantification with prefractionation is that separate sample processing inevitably introduces differences in the fractions to be compared. In principle, correct normalization of each fraction can eliminate this error. However, the total peptide ion signals, necessary in order to perform normalization of the LC MS/MS runs of each fraction, are spread over several adjacent runs. Therefore one cannot sum up the peptide ion signals before one knows the normalization coefficients for each fraction.
I've had a look at the MSnbase documentation, and the issue of sample fractions is mentioned in two sections:
-
https://bioconductor.org/packages/release/bioc/vignettes/MSnbase/inst/doc/MSnbase-demo.html#2_data_structure_and_content
regarding how to load multiple spectra files in eithermzData,mzXMLandmzMLformats usingreadMSData() -
https://bioconductor.org/packages/release/bioc/vignettes/MSnbase/inst/doc/MSnbase-demo.html#13_combining_msnset_instances
regarding how to combine multiple runs of the same sample i.e technical replicates if I understood correctly.
However, in none of them is this specific issue of cross-fraction XIC normalization addressed.
Is there any readily available function in the package that could read a table similar to the moFF output:
| Protein(s) | Sequence | CondtionA_Fraction1 | ConditionA_Fraction2 |
|---|---|---|---|
| PXXXX | Peptide 1 | XIC1_A_1 | XIC1_A_2 |
| PYYYY | Peptide 2 | XIC2_A_1 | XIC2_A_2 |
| ... | ... | ... | ... |
where rows represent peptides and columns represent
- the inferred precursor protein
- the peptide sequence
- and all after it: XICs for each individual run (defined by a combination of condition/treatment, replicate and fraction)
Enabling this feature would make it possible to connect moFF output to MSnbase in a straightforward way in datasets where sample fractionation across runs is present.
Thank you very much for your help beforehand
Cheers,
Antonio