Skip to content

Integration of CDK ExhaustiveFragmenter#133

Open
JonasSchaub wants to merge 25 commits intoproductionfrom
dev-tom-exh-frag
Open

Integration of CDK ExhaustiveFragmenter#133
JonasSchaub wants to merge 25 commits intoproductionfrom
dev-tom-exh-frag

Conversation

@JonasSchaub
Copy link
Copy Markdown
Collaborator

No description provided.

@JonasSchaub JonasSchaub added the enhancement New feature or request label Oct 7, 2024
@JonasSchaub JonasSchaub changed the title Initialization of new fragmenter class Integration of CDK ExhaustiveFragmenter Oct 7, 2024
@ToLeWeiss ToLeWeiss marked this pull request as ready for review October 22, 2024 15:18
@JonasSchaub
Copy link
Copy Markdown
Collaborator Author

JonasSchaub commented Oct 23, 2024

@ToLeWeiss , I fear performance will be a big issue here. I just started a fragmentation with 1000 natural products and it is still running. We need to discuss this, test it more, and maybe document it somewhere.

Final result of my run:

Oct 23, 2024 1:45:35 PM de.unijena.cheminf.mortar.controller.MainViewController lambda$importMoleculeFile$25
INFO: Successfully imported 1000 molecules from file: Picked_COCONUT_subset_1000_mols.sdf; 0 molecules could not be parsed into the internal data model (SMILES code generation failed). See above how many molecules could not be read from the input file at all or produced exceptions while preprocessing.
Oct 23, 2024 1:45:41 PM de.unijena.cheminf.mortar.controller.MainViewController startFragmentation
INFO: Start of method startFragmentation
Oct 23, 2024 1:45:41 PM de.unijena.cheminf.mortar.model.fragmentation.FragmentationService startFragmentation
INFO: Fragmentation "Exhaustive Fragmenter" (Exhaustive Fragmenter) starting. Current memory consumption: 168 MB
Oct 23, 2024 2:23:38 PM de.unijena.cheminf.mortar.model.fragmentation.FragmentationService startFragmentation
INFO: Fragmentation "Exhaustive Fragmenter" (Exhaustive Fragmenter) of 1,000 molecules complete. It took 2,277,137 ms. Current memory consumption: 2,496 MB
Oct 23, 2024 2:23:38 PM de.unijena.cheminf.mortar.model.fragmentation.FragmentationService startSingleFragmentation
INFO: Number of different fragments extracted: 169,490
Oct 23, 2024 2:23:39 PM de.unijena.cheminf.mortar.controller.MainViewController lambda$startFragmentation$44
INFO: End of method startFragmentation after 2277.6163057

And afterwards, it aborted with an out of memory error...
But as you can see, 169,490 fragments were generated from 1,000 natural products in 38 minutes (4 parallel threads and while I was also working on other things on the same laptop).

Copy link
Copy Markdown
Collaborator Author

@JonasSchaub JonasSchaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just some minor things in the comments.

@JonasSchaub
Copy link
Copy Markdown
Collaborator Author

When the new fragmenter is added, we also need to describe it in the tutorial: https://docs.google.com/document/d/1dbpSZfdmOYaQqdYTDZj0NfKuy9TMp6T4SF6lCBB7bFo/edit?usp=sharing
@ToLeWeiss could you suggest some edits there? I need to redo some of the pictures, so that they are consistent.

@JonasSchaub
Copy link
Copy Markdown
Collaborator Author

@ToLeWeiss ready for re-review?

@ToLeWeiss
Copy link
Copy Markdown
Collaborator

Now it is ready for a review.

Copy link
Copy Markdown
Collaborator Author

@JonasSchaub JonasSchaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

* from the CDK, available for MORTAR.
* from the CDK available for MORTAR. It has a performance of O(n!) where n is the number of splittable bonds. Splittable
* bonds are defined as non-ring, single bonds that are connected to at least one other atom, that is <b>not</b> an
* implicit hydrogen.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the term "terminal" here

@JonasSchaub
Copy link
Copy Markdown
Collaborator Author

@ToLeWeiss everything looks very good and ready to merge but based on what we just discussed, I would not merge it now, so that you can do your re-implementation of the exhaustive fragmentation here. What do you think?

@ToLeWeiss
Copy link
Copy Markdown
Collaborator

Do you mean a side by side implementation of the CDK Exhaustive Fragmenter and a "fast" Exhaustive Fragmenter approach here @JonasSchaub ?

@JonasSchaub
Copy link
Copy Markdown
Collaborator Author

Do you mean a side by side implementation of the CDK Exhaustive Fragmenter and a "fast" Exhaustive Fragmenter approach here @JonasSchaub ?

I mean that it probably won't make sense to integrate the CDK ExhaustiveFragmenter right now, in the state that it is in, but rather wait for your optimised implementation.

@ToLeWeiss
Copy link
Copy Markdown
Collaborator

Ah okay.
Yes that seems reasonable.

@JonasSchaub
Copy link
Copy Markdown
Collaborator Author

@ToLeWeiss please merge the production branch into your current branch when you have time (tell me if you need help with that).

@ToLeWeiss
Copy link
Copy Markdown
Collaborator

I would be happy to merge production into this branch to be up to date, but which strategy (normal merge or rebase) is preferred here @JonasSchaub ?

@JonasSchaub
Copy link
Copy Markdown
Collaborator Author

I would be happy to merge production into this branch to be up to date, but which strategy (normal merge or rebase) is preferred here @JonasSchaub ?

A normal merge would be the way to go (less dangerous than a rebase).

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
10.6% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@sonarqubecloud
Copy link
Copy Markdown

@ToLeWeiss
Copy link
Copy Markdown
Collaborator

Hey @JonasSchaub the code would be ready for a review if you happen to find some time on your hands. (It seems i can not request a review because we probably are both authors)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new fragmentation algorithm integration by wrapping CDK’s ExhaustiveFragmenter, exposing its settings to the MORTAR GUI/settings system, and registering it in the fragmentation service.

Changes:

  • Introduces CDKExhaustiveFragmenter implementing IMoleculeFragmenter (settings, filtering/preprocessing, fragmentation).
  • Registers the new fragmenter in FragmentationService and adds en-GB UI strings for its settings.
  • Adds JUnit tests for basic instantiation and sample fragmentations (incl. stereo preprocessing behavior).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/main/java/de/unijena/cheminf/mortar/model/fragmentation/algorithm/CDKExhaustiveFragmenter.java New wrapper around CDK exhaustive fragmentation with GUI-exposed settings and preprocessing/filter logic.
src/main/java/de/unijena/cheminf/mortar/model/fragmentation/FragmentationService.java Registers the new fragmenter instance in the service’s fragmenter list.
src/main/resources/de/unijena/cheminf/mortar/message/Message_en_GB.properties Adds display names/tooltips for the new fragmenter’s settings.
src/test/java/de/unijena/cheminf/mortar/model/fragmentation/algorithm/CDKExhaustiveFragmenterTest.java Adds unit tests for basic settings access, fragmentation, and stereo preprocessing.
gradlew Adds the Gradle POSIX wrapper script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator Author

@JonasSchaub JonasSchaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already looks quite good. You have to clean up some names, some doc, and some parameter inconsistencies, though. My "biggest" point is to add another setting that dis- and enables the a priori filtering by nr. of splittable bonds.

Comment thread src/main/resources/de/unijena/cheminf/mortar/message/Message_en_GB.properties Outdated
Comment thread src/main/resources/de/unijena/cheminf/mortar/message/Message_en_GB.properties Outdated
CDKExhaustiveFragmenter.splittableBondsThreshold.displayName = Splittable Bonds threshold setting
CDKExhaustiveFragmenter.minFragmentSize.tooltip = Defines the minimum number of atoms (inclusive) that are required for one Fragment. The number of atoms consists of all atoms, that are connected by more than one single bond or have more than one single bond.
CDKExhaustiveFragmenter.minFragmentSize.displayName = Minimum fragment size setting
CDKExhaustiveFragmenter.inclusiveMaxTreeDepth.tooltip = Defines the maximum depth of the fragment tree. This setting controls how deeply the fragmentation algorithm will explore molecular fragment combinations. A higher value allows for more extensive fragmentation but may increase computational complexity.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be shortened a bit. And it definitely does increase comp. complexity!

Comment thread src/main/resources/de/unijena/cheminf/mortar/message/Message_en_GB.properties Outdated
Comment thread src/main/resources/de/unijena/cheminf/mortar/message/Message_en_GB.properties Outdated
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
4.3% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants