Skip to content

Question About Extracting Species-Specific Peptides from Unipept Database #75

@baynec2

Description

@baynec2

Hi,

First of all, thank you for all the work you've done on Unipept — it's an incredible tool, and it’s been super helpful for our metaproteomics work!

I’m currently trying to generate a list of peptides that are unique at the species level by querying the Unipept database. I expected this information might be available in the sequences.tsv.gz file from the 2025-03-19 release, since my understanding was that this file contains peptides and their corresponding lowest common ancestor (LCA).

However, when testing peptides from that file against the web interface, I noticed mismatches. For example, the peptide AAGGQGLHVTAL has an LCA of 9606 (Homo sapiens) in the file, but the web tool returns an LCA of Mammalia, with 64 UniProt matches. This seems to indicate that the LCA in sequences.tsv.gz is not calculated the same way as in the web version — or perhaps I’m misunderstanding the structure of the file or working with outdated information.

Could you help clarify where I might find accurate species-level unique peptides or how to interpret the LCA field in the file correctly?

Thanks so much for your time!

Best,
Charlie

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions