Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
November 25th update consolidating infrastructure improvements, Scimilarity model support, RNA inverse folding analysis, and comprehensive code quality enhancements.
Scale: 208 files changed, 42,130 insertions(+), 3,924 deletions(-)
Key Changes
Infrastructure & Developer Experience
New Features
Code Quality
Refactoring
__init__+setup())SequenceBackboneOutputdataclassClassificationDataModulebase classrename_cols@once_onlydecorator for reliabilityCleanup
Testing
Breaking Changes
None for most users - changes are backward compatible via legacy adapter system.
Custom backbone/task/data module developers will need to implement new required methods. See migration guide in PR_SUMMARY_nov25.md.
Migration Notes for Contributors
pre-commit installto enable new commit hookspoetry lockinstead ofpip-compilefor dependency updatesFor Developers
setup(),process_batch(),required_data_columns()@once_only, extract.last_hidden_state, implementrequired_data_columns(stage)provided_columns()property🔧 Refactored Components
1. Backbone Architecture Refactoring (1,996 lines)
Old vs New Pattern
BEFORE:
AFTER:
Key Changes
1. Two-Phase Initialization
__init__(): Store configuration onlysetup(): Load actual models/weights2. Structured Output:
SequenceBackboneOutput3. Embedding Caching Infrastructure
_CacheProfilertracks hits/misses4. Required Data Columns
Migration Impact
setup(),process_batch(),required_data_columns().setup()2. Data Module Refactoring (1,522 lines)
Architecture Change
BEFORE:
AFTER:
Key Improvements
1. Class Hierarchy
2. Unified Column Handling
x_col: Can be string or list (multi-input support)rename_cols: Dictionary mapping (clearer than parallel lists)3. New Features
provided_columns()property: Declares available columnsgenerate_uidparameter: Auto-generates unique IDs for cachingclass_weightproperty: Automatic class weighting for imbalanced datasetsMigration Impact
extra_cols+extra_col_aliaseswithrename_colsprovided_columns()3. Task System Refactoring (1,133 lines)
Initialization Pattern Change
BEFORE:
AFTER:
Key Changes
1.
@once_onlyDecoratorconfigure_model()calls2. Backbone Created in
__init__setup()3. Unified Batch Processing
4. Data-Dependent Loss Configuration
5. Stage-Specific Data Requirements
Migration Impact
transform()to useprocess_batch()forward()must extract.last_hidden_statefromSequenceBackboneOutputrequired_data_columns(stage)configure_model()if data-dependent4. Adapter System Changes (215 lines)
Changes
Migration Impact
5. Documentation System Refactoring (776 lines)
Problem Being Solved
OLD SITUATION:
NEW SOLUTION:
How It Works
Two-Part System:
Runtime:
GoogleKwargsDocstringInheritanceInitMetametaclass__init__signatures__signature__and__doc____griffe_signature__for docsBuild Time:
KwargsDocstringInheritanceGriffe extension__griffe_signature__during doc generationBenefits
Used Throughout Codebase
SequenceBackboneInterfacesubclasses**kwargs6. Test Infrastructure Changes (868 lines)
New Tests
File:
tests/backbones/test_base.py(311 lines)1.
SequenceBackboneOutputTests2. Caching Tests
3. Fixture Updates
Benefits
📊 Breaking Changes & Migration Guide
For End Users
output = backbone(...)returnsTensorSequenceBackboneOutput.last_hidden_state.setup()backbone.setup()before useextra_cols+extra_col_aliasesx_collist +rename_colsdict"sequences"rename_colsExample Migration
OLD CODE:
NEW CODE:
Files Changed
1. Development Infrastructure Modernization
Pre-commit Hooks Integration
.pre-commit-config.yamlwith:CONTRIBUTING.mdwith pre-commit setup instructionsDependency Management Migration
pip-compileto Poetrypoetry.lockfile (7,934 lines) for reproducible buildsDocker Updates
RUNcommand in Dockerfile (line 21)2. New Model Support: Scimilarity
Model Integration
modelgenerator/huggingface_models/scimilarity/nn_models.py(206 lines)modelgenerator/huggingface_models/scimilarity/model_v1.1/layer_sizes.jsonmodelgenerator/cell/gene_lists/scimilarity_genes.tsv(28,232 gene names)modelgenerator/cell/utils.py(109 line changes)3. RNA Inverse Folding Enhancements
New Zero-shot Analysis Pipeline
modelgenerator/rna_inv_fold/zeroshot_analyses.py(163 lines)RNA Task Updates
modelgenerator/rna_ss/rna_ss_task.py(51 line changes)modelgenerator/rna_ss/rna_ss_data.py(11 line changes)modelgenerator/rna_inv_fold/data_inverse_folding/dataset.py4. Code Quality & Documentation
Documentation System Enhancements
modelgenerator/utils/kwargs_doc.py(465 lines)modelgenerator/utils/griffe_kwargs_extension.py(311 lines)GoogleKwargsDocstringInheritanceInitMetafor automatic kwargs parameter inheritancedocs/docs/usage/embedding_caching.mdx_col,y_col,rename_cols)Code Formatting
5. Core Architecture Refactoring
Backbone Architecture (1,996 lines changed)
modelgenerator/backbones/backbones.py(1,333 line changes)modelgenerator/backbones/base.py(663 line changes)modelgenerator/backbones/__init__.py(127 line changes)Data Module Refactoring (1,522 lines changed)
modelgenerator/data/data.py(1,522 line changes)modelgenerator/data/__init__.py(380 line changes)modelgenerator/data/base.py(33 line changes)Task System Updates (1,133 lines changed)
modelgenerator/tasks/tasks.py(1,133 line changes)modelgenerator/tasks/base.py(159 line changes)Adapter Updates (215 lines changed)
modelgenerator/adapters/adapters.py(79 line changes)modelgenerator/adapters/fusion.py(136 line changes)6. Experiment Configuration Updates
New Configurations
experiments/AIDO.Protein/xTrimo/configs/stability_prediction.yamlConfiguration Updates
7. Testing Infrastructure
Test Organization
__init__.pyfiles to test directories for better organizationtests/backbones/test_base.py(311 lines)tests/conftest.py(57 line changes)