Skip to content

Updated PTM featured - Improved TF Data pipeline - Prosit Model refactoring - Feature Extraction fixes#86

Merged
omsh merged 9 commits intomainfrom
develop
Feb 9, 2026
Merged

Updated PTM featured - Improved TF Data pipeline - Prosit Model refactoring - Feature Extraction fixes#86
omsh merged 9 commits intomainfrom
develop

Conversation

@omsh
Copy link
Copy Markdown
Collaborator

@omsh omsh commented Feb 9, 2026

This PR refactors the Prosit intensity models (TensorFlow + PyTorch) to support configurable PTM and metadata branches, improves the TensorFlow dataset pipeline behavior, fixes lookup-based feature extraction for overlength sequences, and expands tests/docs around datasets and processors.

Changes:

  • Refactor Prosit intensity predictors (TF + Torch) with explicit configuration/validation for PTM features, metadata, and optional instrument embeddings.
  • Update PeptideDataset TF pipeline to control shuffling/batching/prefetching outside of to_tf_dataset, and improve padding/feature-extraction lengths when termini are included.
  • Add/expand test coverage for processors, datasets, and feature extractors; add a datasets guide to the docs.

victorgiurcoiu and others added 8 commits December 9, 2025 17:32
…ument values in datasets (#82)

* dataset guide and minor doc additions

* changed default alphabet value to be None to trigger learning the tokens and be more explicit

* comments

NOTE: this PR breaks previous usage if the alphabet was implicitly assumed by the user to be ALPHABET_UNMOD. Yet, we choose to move to a more explicit approach for better transperancy and reproducibility.
* cache - shuffle - batch - prefetch order

* fixes related to termini - hf label column future warning

* pr comments + tests
…mproved tests (#84)

* prosit model changes and tests

* fix in feature extraction + tests

* refined run scripts for intensity

* refactored tests
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Prosit intensity models (TensorFlow + PyTorch) to support configurable PTM and metadata branches, improves the TensorFlow dataset pipeline behavior, fixes lookup-based feature extraction for overlength sequences, and expands tests/docs around datasets and processors.

Changes:

  • Refactor Prosit intensity predictors (TF + Torch) with explicit configuration/validation for PTM features, metadata, and optional instrument embeddings.
  • Update PeptideDataset TF pipeline to control shuffling/batching/prefetching outside of to_tf_dataset, and improve padding/feature-extraction lengths when termini are included.
  • Add/expand test coverage for processors, datasets, and feature extractors; add a datasets guide to the docs.

Reviewed changes

Copilot reviewed 27 out of 31 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_torch_models.py Updates TF↔Torch equivalence test to pass explicit seq_length.
tests/test_torch_dataset.py Moves test data wiring to fixtures; updates expected batch shapes and termini behavior.
tests/test_processors.py Adds comprehensive tests for encoding, PTM removal, function processor, and edge cases.
tests/test_models.py Aligns expected error behavior for missing PTM/metadata inputs; adds “no metadata” case.
tests/test_feature_extractors.py Adds tests for LookupFeatureExtractor padding/truncation behavior.
tests/test_datasets.py Switches to fixtures for assets/data; adds TF dataset label-shape test coverage.
tests/conftest.py Replaces global pytest variables with explicit fixtures; adds alphabets and helper fixtures.
src/dlomix/models/prosit_torch.py Major refactor of Torch Prosit intensity predictor (configurable inputs, PTM/meta branches).
src/dlomix/models/prosit.py Major refactor of TF Prosit intensity predictor (configurable inputs, PTM/meta branches).
src/dlomix/data/retention_time.py Changes default alphabet behavior to allow alphabet learning (alphabet=None).
src/dlomix/data/processing/processors.py Updates docs example output for parsing to use []- (docstring change).
src/dlomix/data/processing/pickled_feature_dicts/saved_loss_atoms.pkl Adds/updates serialized lookup data for PTM-related feature extraction.
src/dlomix/data/processing/pickled_feature_dicts/saved_gained_atoms.pkl Adds/updates serialized lookup data for PTM-related feature extraction.
src/dlomix/data/processing/pickled_feature_dicts/mz_diff.pkl Adds/updates serialized lookup data for PTM-related feature extraction.
src/dlomix/data/processing/feature_extractors.py Fixes lookup feature extraction to cap lookup at max_length before padding.
src/dlomix/data/ion_mobility.py Changes default alphabet behavior to allow alphabet learning (alphabet=None).
src/dlomix/data/fragment_ion_intensity.py Changes default alphabet behavior to allow alphabet learning (alphabet=None).
src/dlomix/data/detectability.py Changes default alphabet behavior to allow alphabet learning (alphabet=None).
src/dlomix/data/dataset_config.py Adds post-init validation and normalizes label_column to list form.
src/dlomix/data/dataset.py Adjusts padding/feature lengths with termini; revises TF dataset shuffle/batch/prefetch flow.
src/dlomix/data/charge_state.py Changes default alphabet behavior to allow alphabet learning (alphabet=None).
src/dlomix/_metadata.py Bumps version to 0.2.4 and updates copyright year.
run_scripts/run_prosit_intensity_torch.py Updates example script to use learned alphabet from dataset.
run_scripts/run_prosit_intensity_ptms_torch.py Updates PTM Torch example script (seq len, features, termini, learned alphabet).
run_scripts/run_prosit_intensity_ptms.py Updates PTM TF example script to use dataset-learned alphabet and meta branch.
run_scripts/run_prosit_intensity.py Updates TF training example to use meta branch, termini, learned alphabet, and .keras checkpoint path.
docs/notes/dataset_guide.rst Adds a comprehensive datasets guide (new).
docs/index.rst Adds the dataset guide into the docs structure/navigation.
docs/dlomix.rst Updates package docs TOC to include callbacks; removes some module listings.
docs/dlomix.callbacks.rst Adds Sphinx page for dlomix.callbacks (new).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@omsh omsh merged commit 3631902 into main Feb 9, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants