Skip to content

Duplicate peptides in phosphoproteomics data #70

@LJMHiggins

Description

@LJMHiggins

Firstly, thank you for putting together this useful resource.

I have been accessing the phoshphoproteomics data for some analyses, and have found what looks like peptide duplication.

I am using cptac Version: 1.5.14

I found the duplication by running the following:

luad = cptac.Luad()
test = luad.get_phosphoproteomics("bcm")
flat_columns = ['_'.join(map(str, col)) for col in test.columns]
duplicates = pd.Series(flat_columns).duplicated()
len(pd.Series(flat_columns)[duplicates])
`
Giving 61807 duplicates.

Inspection of a specific peptide confirmed duplication:

test.loc[:, [col for col in test.columns if "SCPIKEDSFLQRYSS" in col]]

Not sure if this is present in other tumour types at this stage. Do you know why I could be seeing this?

Many thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions