Skip to content

Commit 06d22fc

Browse files
authored
Merge pull request #16 from JonasSchaub/sdu-integration
SugarDetectionUtility integration and re-work of input file type detection (enhanced SMILES file parsing)
2 parents 5b681fd + a5360e3 commit 06d22fc

21 files changed

Lines changed: 3383 additions & 403 deletions

.github/workflows/publish-javadoc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,5 @@ jobs:
2121
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
2222
javadoc-branch: javadoc
2323
java-version: 21
24-
target-folder: javadoc/1.5
24+
target-folder: javadoc/1.6
2525
project: maven

CITATION.cff

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
cff-version: 1.2.0
22
title: Sugar Removal Utility (SRU)
3-
version: 1.5.0.0
3+
version: 1.6.0.0
44
message: "If you use this software, please cite it as below and also cite the accompanying scientific publication referenced below."
55
type: software
66
authors:
@@ -17,7 +17,7 @@ authors:
1717
given-names: "Maria"
1818
orcid: "https://orcid.org/0000-0001-9359-7149"
1919
doi: "10.5281/zenodo.7082113"
20-
date-released: 2025-03-19
20+
date-released: 2025-10-01
2121
url: "https://github.com/JonasSchaub/SugarRemoval"
2222
license: MIT
2323
references:

README.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,18 +33,20 @@ removal of circular and linear sugars from molecular structures, is described in
3333
C. et al. Too sweet: cheminformatics for deglycosylation in natural products. J Cheminform 12, 67 (2020)](https://doi.org/10.1186/s13321-020-00467-y). There,
3434
you can find all necessary details about the algorithm and its various configuration options. We also published a
3535
[follow-up article](https://doi.org/10.3390/biom11040486) where we used the SRU to analyse sugar moieties in the
36-
Collection of Open Natural products (COCONUT) database.
36+
Collection of Open Natural products (COCONUT) database. Recently, we also developed an extension called Sugar Detection
37+
Utility (SDU) which is described here, for now: [https://github.com/cdk/cdk/pull/1225](https://github.com/cdk/cdk/pull/1225).
3738
* This repository *used to* host the SRU source code, but it has now moved to the
3839
<a href="https://github.com/cdk/cdk">Chemistry Development Kit (CDK)</a> Java library for cheminformatics. If you want to
3940
use the SRU as a Java library, you now need to use the CDK version 2.10 or higher. Information on how to install and use
4041
the CDK can be found in the GitHub repository linked above. You can then use the SRU via CDK's
41-
[SugarRemovalUtility class](https://github.com/cdk/cdk/blob/main/misc/extra/src/main/java/org/openscience/cdk/tools/SugarRemovalUtility.java).
42+
[SugarRemovalUtility class](https://github.com/cdk/cdk/blob/main/misc/extra/src/main/java/org/openscience/cdk/tools/SugarRemovalUtility.java)
43+
or the [SugarDetectionUtility extension](https://github.com/cdk/cdk/blob/main/misc/extra/src/main/java/org/openscience/cdk/tools/SugarDetectionUtility.java).
4244
* This repository now only hosts the SRU command-line application and its source code and it serves as a place for
4345
documentation about the algorithm.
4446
* The SRU's functionalities can also be used in other software tools:
4547
* The SRU web application is available at [https://sugar.naturalproducts.net](https://sugar.naturalproducts.net) and its source code
4648
can be found [here](https://github.com/mSorok/SugarRemovalWeb).
47-
* The Sugar Removal Utility is also available in the open Java rich client application MORTAR ('MOlecule fRagmenTation
49+
* The Sugar Removal/Detection Utility is also available in the open Java rich client application MORTAR ('MOlecule fRagmenTation
4850
fRamework') where <i>in silico</i> molecule fragmentation can be easily conducted on a given data set and the results
4951
visualised ([MORTAR GitHub repository](https://github.com/FelixBaensch/MORTAR)
5052
| [MORTAR article](https://doi.org/10.1186/s13321-022-00674-9))
@@ -59,8 +61,8 @@ moiety detection and removal using the SRU.
5961
### Sources
6062
The sources available in <i>/src/main/java/de/unijena/cheminf/deglycosylation/</i> belong to the SRU command-line
6163
application. It makes the various settings for fine-tuning the sugar detection and removal process available through
62-
command-line arguments. But using the CDK <i>SugarRemovalUtility</i> class directly in your own software project offers some
63-
additional configuration options and functionalities:
64+
command-line arguments. But using the CDK <i>SugarRemovalUtility / SugarDetectionUtility</i> classes directly in your
65+
own software project offers some additional configuration options and functionalities:
6466
* Adding and removing circular and linear sugar patterns for the initial detection steps
6567
* Sugar detection without removal
6668
* Detecting only the number of sugar moieties of a molecule
@@ -69,6 +71,7 @@ The class <i>SugarRemovalUtilityTest</i> can be found in the directory
6971
<i>/src/test/java/de/unijena/cheminf/deglycosylation/</i>. It is a JUnit test class that tests the performance of the
7072
Sugar Removal Utility on multiple specific molecular structures of natural products hand-picked from public databases
7173
(see article linked above). Code examples of how to use and configure the <i>SugarRemovalUtility</i> class can be found here.
74+
There is also an analogous test class for the <i>SugarDetectionUtility</i>.
7275

7376
### SugarRemovalUtility CMD App
7477
The sub-folder ["SugarRemovalUtility CMD App"](https://github.com/JonasSchaub/SugarRemoval/tree/main/SugarRemovalUtility%20CMD%20App)
@@ -93,12 +96,12 @@ As stated above, the Sugar Removal Utility is now part of the
9396
install the SRU externally, you can use it via CDK's SugarRemovalUtility class. If not, please follow the installation
9497
description in the CDK repository linked above.
9598
<br>
96-
The Sugar Removal Utility web applcation in *this* repository is hosted as a package/artifact on the sonatype maven
99+
The Sugar Removal Utility web application in *this* repository is hosted as a package/artifact on the sonatype maven
97100
central repository. See the [artifact page](https://central.sonatype.com/artifact/io.github.jonasschaub/sru/) for installation guidelines using build tools like maven or gradle.
98101
To install it via its JAR archive, you can get it from the [releases](https://github.com/JonasSchaub/SugarRemoval/releases).
99102
Note that other dependencies will need to be installed via JAR archives as well this way.
100103

101-
### Command line application JAR
104+
### Command line application JAR
102105
The command-line application JAR has to be downloaded. After that, it can be executed from the command-line
103106
as described in the usage instructions. Java version 17 or higher has to be installed on your machine.
104107

@@ -117,7 +120,7 @@ care of installing all dependencies.
117120
* Apache License, version 2.0
118121

119122
**Managed by Maven:**
120-
* Chemistry Development Kit (CDK) version 2.10
123+
* Chemistry Development Kit (CDK) version 2.12-SNAPSHOT
121124
* [Chemistry Development Kit on GitHub](https://cdk.github.io/)
122125
* License: GNU Lesser General Public License 2.1
123126
* JUnit version 5.10.0
Binary file not shown.

SugarRemovalUtility CMD App/Usage instructions.txt

Lines changed: 31 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,24 @@
11
---------------------------------------------------------------------------------------------------
2-
Sugar Removal Utility Command Line Application Usage instructions v. 1.5
2+
Sugar Removal Utility Command Line Application Usage instructions v. 1.6
33
---------------------------------------------------------------------------------------------------
44

55
This application can be used to remove sugar moieties from molecules in a given data set, according
66
to "Schaub, J., Zielesny, A., Steinbeck, C., Sorokina, M. Too sweet: cheminformatics for deglycosylation in natural
7-
products. J Cheminform 12, 67 (2020). https://doi.org/10.1186/s13321-020-00467-y".
7+
products. J Cheminform 12, 67 (2020). https://doi.org/10.1186/s13321-020-00467-y". Additionally, it includes the
8+
"Sugar Detection Utility" extension (https://github.com/cdk/cdk/pull/1225) to extract sugar moieties correctly.
89

9-
Accepted input formats: MDL Molfile, MDL Structure data file (SDF), and SMILES file (of
10-
format: [SMILES string][space][name] in each line, see example file)
10+
Accepted input formats: MDL Molfile, MDL Structure data file (SDF), and SMILES file
11+
(determined by file extension; .sdf and .mol will be recognized as SDF and Molfile, respectively; .smi, .smiles, .csv,
12+
.tsv, and .txt will be interpreted as SMILES file).
1113

12-
The output is a comma-separated value (CSV) text file containing:
14+
The output is a comma-separated value (CSV) text file containing (separated by semicolon ;):
1315
- the respective number of each molecule in the input file
1416
- detected identifiers of the molecules
15-
- SMILES strings of the original molcules
16-
- SMILES strings of the deglycosylated molecules
17-
- SMILES strings of the removed sugar moieties
17+
- SMILES strings of the original molecules
18+
- SMILES strings of the deglycosylated molecules (aglycone)
19+
- whether the molecule contained sugar moieties
20+
- SMILES strings of the removed sugar moieties (variable number of columns here depending on how many sugars were removed from
21+
the respective molecule)
1822

1923
The output file and a log file will be created in the same directory as the input file.
2024

@@ -29,7 +33,8 @@ the JAR file. These are (-[short name] --[long name] <argument>):
2933
java -jar SugarRemovalUtility-jar-with-dependencies.jar -i <filePath> -t <integer>
3034
[-glyBond <boolean>] [-remTerm <boolean>] [-presMode <integer>] [-presThres <integer>]
3135
[-oxyAtoms <boolean>] [-oxyAtomsThres <number>] [-linSugInRings <boolean>] [-linSugMinSize <integer>]
32-
[-linSugMaxSize <integer>][-linAcSug <boolean>] [-circSugSpiro <boolean>] [-circSugKetoGroups <boolean>]
36+
[-linSugMaxSize <integer>] [-linAcSug <boolean>] [-circSugSpiro <boolean>] [-circSugKetoGroups <boolean>]
37+
[-markR <boolean>] [-postProcSug <boolean>] [-limitPostProcBySize <boolean>]
3338

3439
- Option -h --help: Print usage and help information regarding the required command-line arguments and
3540
options. If this option is used, the application is exited afterwards.
@@ -57,7 +62,7 @@ java -jar SugarRemovalUtility-jar-with-dependencies.jar -i <filePath> -t <intege
5762
- This option and its argument are always required
5863

5964
-> These two options must ALWAYS be given (except if --help or --version are used). The remaining
60-
arguments are optional. If they are not specified, they will be in their default value.
65+
arguments are optional. If they are not specified, they will be set to their default value.
6166

6267
- Option -glyBond --detectCircSugOnlyWithGlyBond <boolean>: Either "true" or "false", indicating
6368
whether circular sugars should be detected (and removed) only if they have an O-glycosidic bond
@@ -129,14 +134,29 @@ java -jar SugarRemovalUtility-jar-with-dependencies.jar -i <filePath> -t <intege
129134
whether circular sugar-like moieties with keto groups should be detected.
130135
- Any other value of this argument will be interpreted as "false"
131136
- Default: "false"
137+
- Option -markR --markAttachmentPointsByR <boolean>: Either "true" or "false", indicating whether the attachment
138+
points of removed sugar moieties should be marked by R groups (pseudo atoms, * in the exported SMILES codes)
139+
in the deglycosylated core structure and on the extracted sugars.
140+
- Any other value of this argument will be interpreted as "false"
141+
- Default: "false"
142+
- Option -postProcSug --postProcessSugars <boolean>: Either true or false, indicating whether the extracted sugar
143+
moieties should be post-processed, i.e. bond splitting (O-glycosidic, ether, ester, peroxide) to separate the
144+
individual sugars, before being output.
145+
- Any other value of this argument will be interpreted as "false"
146+
- Default: "false"
147+
- Option -limitPostProcBySize --limitPostProcessingBySize <boolean>: Either true or false, indicating whether the
148+
post-processing of extracted sugar moieties should be limited to structures bigger than a defined size (see
149+
preservation mode (threshold)) to preserve smaller modifications.
150+
- Any other value of this argument will be interpreted as "false"
151+
- Default: "false"
132152

133153
Example usages:
134154
- Removing circular AND linear sugars from all molecules in the given example file, using default settings
135155
implicitly:
136156
java -jar "SugarRemovalUtility-jar-with-dependencies.jar" -i "smiles_test_file.txt" -t "3"
137157
- Removing circular AND linear sugars from all molecules in the given example file, using all default settings
138158
explicitly:
139-
java -jar "SugarRemovalUtility-jar-with-dependencies.jar" -i "smiles_test_file.txt" -t "3" -glyBond "false" -remTerm "true" -presMode "1" -presThres "5" -oxyAtoms "true" -oxyAtomsThres "0.5" -linSugInRings "false" -linSugMinSize "4" -linSugMaxSize "7" -linAcSug "false" -circSugSpiro "false" -circSugKetoGroups "false"
159+
java -jar "SugarRemovalUtility-jar-with-dependencies.jar" -i "smiles_test_file.txt" -t "3" -glyBond "false" -remTerm "true" -presMode "1" -presThres "5" -oxyAtoms "true" -oxyAtomsThres "0.5" -linSugInRings "false" -linSugMinSize "4" -linSugMaxSize "7" -linAcSug "false" -circSugSpiro "false" -circSugKetoGroups "false" -markR "false" -postProcSug "false" -limitPostProcBySize "false"
140160
- Printing usage and help information and the version of the application:
141161
java -jar "SugarRemovalUtility-jar-with-dependencies.jar" -h -v
142162

pom.xml

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030

3131
<groupId>io.github.jonasschaub</groupId>
3232
<artifactId>sru</artifactId>
33-
<version>1.5.0.0</version>
33+
<version>1.6.0.0</version>
3434
<name>Sugar Removal Utility</name>
3535
<packaging>jar</packaging>
3636
<url>https://github.com/JonasSchaub/SugarRemoval</url>
@@ -46,7 +46,7 @@
4646
<project.build.source>17</project.build.source>
4747
<project.build.target>17</project.build.target>
4848
<java.version>17</java.version>
49-
<cdk.version>2.10</cdk.version>
49+
<cdk.version>2.12-SNAPSHOT</cdk.version>
5050
<junit.version>5.10.0</junit.version>
5151
<hamcrest.version>2.2</hamcrest.version>
5252
<spotless.version>2.40.0</spotless.version>
@@ -100,10 +100,6 @@
100100
</developers>
101101

102102
<distributionManagement>
103-
<snapshotRepository>
104-
<id>ossrh</id>
105-
<url>https://s01.oss.sonatype.org/content/repositories/snapshots</url>
106-
</snapshotRepository>
107103
<repository>
108104
<id>ossrh</id>
109105
<url>https://s01.oss.sonatype.org/service/local/staging/deploy/maven2/</url>
@@ -112,7 +108,7 @@
112108
<repositories>
113109
<repository>
114110
<id>ossrh</id>
115-
<url>https://s01.oss.sonatype.org/content/repositories/snapshots</url>
111+
<url>https://central.sonatype.com/repository/maven-snapshots/</url>
116112
</repository>
117113
<repository>
118114
<id>oss-sonatype</id>

src/main/java/de/unijena/cheminf/deglycosylation/Main.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
* SugarRemovalUtilityCmdApplication class, calls its execute() function and measures the time it takes for execution.
3434
*
3535
* @author Jonas Schaub
36-
* @version 1.5
36+
* @version 1.6
3737
*/
3838
public class Main {
3939
/**

0 commit comments

Comments
 (0)