-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi,
First of all, thank you for all the work you've done on Unipept — it's an incredible tool, and it’s been super helpful for our metaproteomics work!
I’m currently trying to generate a list of peptides that are unique at the species level by querying the Unipept database. I expected this information might be available in the sequences.tsv.gz file from the 2025-03-19 release, since my understanding was that this file contains peptides and their corresponding lowest common ancestor (LCA).
However, when testing peptides from that file against the web interface, I noticed mismatches. For example, the peptide AAGGQGLHVTAL has an LCA of 9606 (Homo sapiens) in the file, but the web tool returns an LCA of Mammalia, with 64 UniProt matches. This seems to indicate that the LCA in sequences.tsv.gz is not calculated the same way as in the web version — or perhaps I’m misunderstanding the structure of the file or working with outdated information.
Could you help clarify where I might find accurate species-level unique peptides or how to interpret the LCA field in the file correctly?
Thanks so much for your time!
Best,
Charlie