In this release, we added support for regression tasks and MoleculeNet datasets used in Tumescheit et al.: Ontology pre-training improves machine learning-based predictions for metabolites (bioRXiv, 2025) (#130). In the context of our new ensemble (Flügel et al.: Chebifier 2: An ensemble for chemistry (SymGenAI4Sci Workshop, 2025)), we now have SMILES canonicalisation by default (#118) and support Logistic Regression and LSTMs (#127). See below for a full list of changes.
What's Changed
- Classwise F1-scores to generate class props script by @aditya0by0 in #112
- add smiles canonicalisation, update tokens.txt by @sfluegel05 in #118
- allow negative samples (and regulate their amount for partial data) by @sfluegel05 in #116
- Fix data splits for pubchem by @sfluegel05 in #119
- Test for CLI + project dependencies changes by @aditya0by0 in #105
- split dependencies into inference (mandatory), training and linters by @aditya0by0 in #114
- Data Augmentation : SMILES by @aditya0by0 in #115
- Enable to set persistent_workers through CLI by @aditya0by0 in #126
- Avoid using iterrows, use vectorization wherever possible by @aditya0by0 in #120
- Fix issue in mlp architecture by @aditya0by0 in #128
- Fix: evaluation skips last batch by @jcapp4 in #129
- Feature/new ensemble models by @sfluegel05 in #127
- Fix for weighted BCE loss by @aditya0by0 in #132
- Enable to load pretrained weights for MLP model by @aditya0by0 in #133
- Avoid generation of original SMILES in augmentation by @aditya0by0 in #136
- New regression and classification datasets for ontology pre-training by @schnamo in #130
New Contributors
Full Changelog: v1.0.3...v1.1.0