Cross-fraction XIC normalization

Hi there!!

First of all, thanks for developing an open source tool in R that can handle such complex datasets as those recurrent in proteomics and mass spec.
 
I am working with the MaxLFQ dataset (paper: http://www.mcponline.org/content/13/9/2513.long)  where the developers of the MaxQuant suite mixed the HeLa and E.coli proteomes in 1:1 and 1:3 ratios. One of the crucial steps in their pipeline is the delayed normalization of XICs (extracted ion currents i.e MS1 intensities of the cluster feature detected in the apex intensity extraction step) across fractions of the same sample (for a total of 24 fractions) using a Levenberg-Marquandt minimization approach. The normalization factors used to sum up the peptide intensities across fractions are then set to the values that minimise the overall proteome variance, so called in the paper H(N). It's a nice approach to the problem they formulated in the paragraph below:

> A major challenge of label-free quantification with prefractionation is that separate sample processing inevitably introduces differences in the fractions to be compared. In principle, correct normalization of each fraction can eliminate this error. However, the total peptide ion signals, necessary in order to perform normalization of the LC MS/MS runs of each fraction, are spread over several adjacent runs. Therefore one cannot sum up the peptide ion signals before one knows the normalization coefficients for each fraction.

I've had a look at the [MSnbase documentation](https://bioconductor.org/packages/release/bioc/vignettes/MSnbase/inst/doc/MSnbase-demo.html), and the issue of sample fractions is mentioned in two sections:
1. https://bioconductor.org/packages/release/bioc/vignettes/MSnbase/inst/doc/MSnbase-demo.html#2_data_structure_and_content
regarding how to load multiple spectra files in either `mzData`, `mzXML` and `mzML` formats using `readMSData()`

2. https://bioconductor.org/packages/release/bioc/vignettes/MSnbase/inst/doc/MSnbase-demo.html#13_combining_msnset_instances
regarding how to combine multiple runs of the same sample i.e technical replicates if I understood correctly.

However, in none of them is this specific issue of cross-fraction XIC normalization addressed.

Is there any readily available function in the package that could read a table similar to the [moFF](https://github.com/compomics/moFF) output:

| Protein(s)  | Sequence | CondtionA_Fraction1 | ConditionA_Fraction2 |
| -------------- | ------------- | ---------------------------- | ------------------------------  | 
| PXXXX  | Peptide 1     |  XIC1_A_1 | XIC1_A_2 |
| PYYYY  | Peptide 2     | XIC2_A_1 | XIC2_A_2 |
| ...  | ...    | ...| ... |

where rows represent peptides and columns represent
1. the inferred precursor protein
2. the peptide sequence
3. and all after it: XICs for each individual run (defined by a combination of condition/treatment, replicate and fraction)

Enabling this feature would make it possible to connect [moFF](https://github.com/compomics/moFF) output to MSnbase in a straightforward way in datasets where sample fractionation across runs is present.

Thank you very much for your help beforehand

Cheers,
Antonio


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-fraction XIC normalization #344

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Protein(s)	Sequence	CondtionA_Fraction1	ConditionA_Fraction2
PXXXX	Peptide 1	XIC1_A_1	XIC1_A_2
PYYYY	Peptide 2	XIC2_A_1	XIC2_A_2
...	...	...	...

Cross-fraction XIC normalization #344

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions