Skip to content

Conversation

@talagayev
Copy link
Member

@talagayev talagayev commented Nov 14, 2024

Fixes #4563

Changes made in this Pull Request:

  • Creation of the SugarSelection class that allows the selection of sugars through
    the access of known PDB, CHARMM and GLYCAM abbreviations
  • Addition of GLYCAM and SUGAR_PDB files in MDAnalysisTest.data
  • Addition of test_sugar_glycam_selection() and test_sugar_pdb_selection
    in test_atomselections.py

Currently I used the following abbreviations:

https://glycam.org/docs/othertoolsservice/2016/06/09/3d-snfg-list-of-residue-names/index.html

In addition of using the GLYCAM Webserver to obtain the known Sugar abbreviations and also
included the aglycans that I obtained from the GLYCAM Weberserver.

The Pytest Files were retrieved from RCSB-PDB and the GLYCAM-Webserver:

https://glycam.org/

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4790.org.readthedocs.build/en/4790/

Implementation of SugarSelection with the known abbreviations and aglycans obtained from the glycam webserver
@pep8speaks
Copy link

pep8speaks commented Nov 14, 2024

Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2025-01-11 17:54:06 UTC

@codecov
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.73%. Comparing base (4c12f15) to head (06ef329).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #4790   +/-   ##
========================================
  Coverage    92.72%   92.73%           
========================================
  Files          180      180           
  Lines        22475    22483    +8     
  Branches      3190     3190           
========================================
+ Hits         20841    20849    +8     
  Misses        1177     1177           
  Partials       457      457           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@talagayev
Copy link
Member Author

talagayev commented Nov 15, 2024

@hmacdope I will ping you here since you were one of the participants in the discussion that initiated the Issue that this PR is covering.

The main problem that I see currently, would be that the GLYCAM Abbreviations for the sugars due to the combinations have quite a lot of different names/abbreviations.

Some of them could lead to tricky cases, with the Allose Nomenclature having RNA as one of the abbreviations.

I checked the RCSB-PDB and found only one case, where a unique ligand was called RNA, but still could be dangerous if the
users call something "RNA" in their files.

As for the coverage, while looking at the PDBs in RCSB-PDB it covers NAG, GLC etc., which would be convenient if somebody wants to select those in the PDB Files, but does not cover for example Glycerol, which makes sense since it is not a sugar, but is quite common in crystal structures among those sugars as NAG, GLC. Does it make sense to have a selection that would somehow covers both cases of Glycerol and similar compounds together with sugars?

@talagayev talagayev marked this pull request as ready for review January 11, 2025 17:54
MadhankumarAI added a commit to MadhankumarAI/mdanalysis that referenced this pull request Jan 17, 2026
Adds a minimal sugar selection that matches common GLYCAM06
hexose residue names (Glc, Gal, Man).

This mirrors existing residue-based selectors in the codebase
and intentionally limits the initial scope to well-documented
cases.

Closes MDAnalysis#4790.
Copy link
Member

@RMeli RMeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs updating against main and a couple of small things, otherwise this is good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gzip this file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compressed the file to 6kya.pdb.gz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is small enough to be ok non-gzipped.

# DSSP testing: from https://github.com/ShintaroMinami/PyDSSP
DSSP = (_data_ref / "dssp").as_posix()

GLYCAM = (_data_ref / 'GLYCAM_sugars.pdb').as_posix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comments to describe these files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the description :)

directly passing them. (Issue #3520, PR #5006)

Enhancements
* Addition of 'sugar' token for GLYCAM, PDB and CHARMM sugar selection (Issue #4790)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my bad - it should go in 2.11! (I forget what we have and haven't released)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, will adjust it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjusted it t 2.11

@talagayev
Copy link
Member Author

I also wanted to get your quick opinion on that PR @IAlibay, since when we implemented the water token in one the PRs you mentioned that there can be the case, that making the tokens will lead to users not carefully checking their files and thus deleting something, so that while it improved the user-friendliness it also led to a potential increase in user mistakes.

Here I have a small worry with this PR, that with the big amount of possible of abbrevations with 3 letters some may overlap with co-crystalized ligands. So for example something like NAG I only saw for the sugars in PDBs, so I would be suprised if a ligand would have that abbreviation, but something like 6lA or in that direction I would not be suprised to encounter in some PDB as the identifier for the co-crystalized ligand and thus may lead to the co-crystalized ligand being selected as soon as the user uses the sugar token.

What would be your opinion on that, how likely something like that could happen, espeically again with here the token covering every possible GLYCAM nomenclature, which leads to many ids falling under it 🤔

talagayev and others added 5 commits January 21, 2026 22:52
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
moved to 2.11.0
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
@talagayev talagayev requested a review from IAlibay January 22, 2026 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create sugar or carbohydrate selection using GLYCAM nomenclature

5 participants