Implement Typer + Hydra Configuration Architecture #147

JemmaLDaniel · 2025-11-26T18:14:22Z

Summary

This PR implements the Typer + Hydra hybrid architecture proposed in #146, refactoring Winnow's configuration management from flat CLI signatures to a flexible, hierarchical system that enables scalable configuration of complex nested components and automatic object instantiation.

Implementation Details

1. Typer + Hydra Hybrid Architecture

Typer now acts as a thin command dispatcher, passing all configuration to Hydra:

def train(ctx: typer.Context) -> None:
    """Passes control directly to the Hydra training pipeline."""
    overrides = ctx.args if ctx.args else None
    train_entry_point(overrides)Pipeline logic moved to `train_entry_point()` and `predict_entry_point()` functions that handle Hydra initialization, configuration composition and pipeline execution.

2. Structured Configuration with Composition

Created modular configuration structure in config/:

train.yaml / predict.yaml - Main pipeline configurations
calibrator.yaml - Model architecture and features
residues.yaml - Amino acid masses and modifications (shared via composition)
data_loader/ - Pluggable dataset format loaders (InstaNovo, MZTab, PointNovo, Winnow)
fdr_method/ - Pluggable FDR methods (nonparametric, database-grounded)

Configuration files use Hydra's defaults mechanism to compose shared components.

3. Hydra-Based Object Instantiation

Used Hydra's _target_ field for automatic instantiation:

Data loaders instantiated from configuration without manual if/elif logic
FDR methods selected and configured via YAML
Users can inject custom implementations by creating YAML configs with _target_ pointing to their classes

4. Configuration Inspection Commands

Added winnow config command group:

winnow config train - Display resolved training configuration
winnow config predict - Display resolved prediction configuration

Implemented custom ConfigFormatter class with hierarchical colour-coding based on YAML nesting depth for improved terminal readability.

5. Lazy Imports for CLI Performance

Implemented lazy import pattern using TYPE_CHECKING to defer heavy dependencies (PyTorch, InstaNovo, etc.) until command execution. This makes --help and config commands respond instantly whilst pipeline commands still have access to all required dependencies.

Added module-level docstring in main.py explaining the rationale.

6. Documentation Updates

Minor improvements to CLI help text and documentation to reflect the new Hydra-based configuration system with examples of dot-notation overrides.

Migration Notes

Existing users will need to:

Use configuration files in config/ instead of passing all parameters via CLI flags
Override parameters using dot notation: winnow train calibrator.seed=42
Consult winnow config <pipeline> to inspect resolved configurations

JemmaLDaniel · 2025-11-26T18:16:39Z

Commits 20ee8b3 and 2529582 also address #143 and #140

BioGeek · 2025-12-01T09:18:15Z

docs/cli.md

+winnow train data_loader=mztab model_output_dir=models/my_model
+
+# Specify dataset paths
+winnow train dataset.spectrum_path_or_directory=data/spectra.parquet dataset.predictions_path=data/preds.csv


When I try this:

winnow train dataset.spectrum_path_or_directory=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/parquet/mouse/dataset-mus-musculus-train-0000-0001.parquet dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv

I get:

[...] ConfigCompositionException: Could not override 'dataset.predictions'. To append to your config use +dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv

When I try that:

winnow train dataset.spectrum_path_or_directory=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/parquet/mouse/dataset-mus-musculus-train-0000-0001.parquet +dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv

I get:

[...] │ /home/j-vangoey/code/winnow/winnow/datasets/data_loaders.py:62 in _load_beam_preds │ │ │ │ 59 │ │ Returns: ╭───────────────── locals ──────────────────╮ │ │ 60 │ │ │ Tuple[pl.DataFrame, pl.DataFrame]: A tuple containing the predictions and be │ predictions_path = 'data/predictions.csv' │ │ │ 61 │ │ """ ╰───────────────────────────────────────────╯ │ │ ❱ 62 │ │ if predictions_path.suffix != ".csv": │ │ 63 │ │ │ raise ValueError( │ │ 64 │ │ │ │ f"Unsupported file format for InstaNovo beam predictions: {predictions_p │ │ 65 │ │ │ ) │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: 'str' object has no attribute 'suffix'

It should be dataset.predictions_path, not dataset.predictions.

I see predictions_path was coming in from the config as a string, so I'll convert to a Path before file loading.

BioGeek · 2025-12-01T09:20:07Z

docs/cli.md

-    --output-folder ./predictions
+```bash
+# Change MLP architecture
+winnow train calibrator.hidden_layer_sizes=[100,50,25]


When I try this

winnow train calibrator.hidden_layer_sizes=[100,50,25]

I get:

zsh: no matches found: calibrator.hidden_layer_sizes=[100,50,25]

Hmm, strange. This works fine for me

BioGeek · 2025-12-01T09:24:43Z

docs/cli.md


-### InstaNovo Configuration
+# Specify dataset paths
+winnow predict dataset.spectrum_path_or_directory=data/spectra.parquet dataset.predictions_path=data/preds.csv


Same comment as earlier. When I try this:

winnow predict dataset.spectrum_path_or_directory=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/parquet/mouse/dataset-mus-musculus-train-0000-0001.parquet +dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv

I get:

[ ...] ConfigCompositionException: Could not override 'dataset.predictions'. To append to your config use +dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv

and when I try that:

winnow predict dataset.spectrum_path_or_directory=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/parquet/mouse/dataset-mus-musculus-train-0000-0001.parquet +dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv

I get

[...] AttributeError: 'str' object has no attribute 'suffix'

BioGeek · 2025-12-01T09:47:16Z

docs/cli.md

+
+Winnow supports multiple input formats:
+
+- **InstaNovo**: Parquet spectra + CSV predictions (beam search format)


Most people will have their input data in *.MGF so I think it would be good to either point to instructions on how to use instanovo convert to convert *.MGF to *.parquet or to add functionality to do that in winnow on the fly.

Very good point! I will address mgf file loading in a new PR, and I can add a bit on this in the docs as a patch for now

BioGeek · 2025-12-01T09:50:27Z

docs/cli.md

+# Predict using pretrained model, InstaNovo predictions and default settings
+winnow predict \
+    dataset.spectrum_path_or_directory=data/test_spectra.parquet \
+    dataset.predictions_path=data/test_predictions.csv


Maybe add small sample test_spectra.parquet and test_predictions.csv files to the repo (or add them to a new relase as assets and add a file to download them) so that people quickly have some sample files to play around with.

BioGeek · 2025-12-01T09:52:27Z

docs/configuration.md

+
+```bash
+# Train with default settings
+winnow train


Training with default settings gives me:

[...] FileNotFoundError: No such file or directory (os error 2): data/spectra.ipc

BioGeek · 2025-12-01T09:53:29Z

docs/configuration.md

+winnow train
+
+# Predict with default settings
+winnow predict


Same here.

FileNotFoundError: No such file or directory (os error 2): data/spectra.ipc

BioGeek · 2025-12-01T11:27:15Z

winnow/scripts/main.py

+    from hydra.utils import instantiate
+
+    with initialize(
+        config_path="../../config", version_base="1.3", job_name="winnow_train"


This config_path="../../config" won't work when we distribute the package via PyPI. To confirm:

Build the winnow-fdr package.

$ uv build Building source distribution... [...] Successfully built dist/winnow_fdr-1.0.3.tar.gz Successfully built dist/winnow_fdr-1.0.3-py3-none-any.whl

Install this wheel

$ cd /tmp $ uv init winnow_demo Initialized project `winnow-demo` at `/tmp/winnow_demo` $ cd winnow_demo $ uv add ~/code/winnow/dist/winnow_fdr-1.0.3-py3-none-any.whl Using CPython 3.13.6 Creating virtual environment at: .venv Resolved 167 packages in 1.94s Prepared 23 packages in 2m 23s [...] $ source .venv/bin/activate $ winnow config train [...] MissingConfigException: Primary config directory not found. Check that the config directory '/tmp/winnow_demo/.venv/lib/python3.13/site-packages/config' exists and readable

We have had the same problem in InstaNovo. The solution is to move your config folder inside the winnow folder

winnow/ config/ data_loader instanovo.yaml [...]

and the use importlib:

from importlib.resources import files def train_entry_point(overrides=None, execute=True): from hydra import initialize, compose from hydra.utils import instantiate from hydra.core.global_hydra import GlobalHydra # Reset Hydra if called multiple times in same process GlobalHydra.instance().clear() # Resolve config directory inside package config_dir = files("winnow").joinpath("config") with initialize( config_path=str(config_dir), version_base="1.3", job_name="winnow_train", ): cfg = compose(config_name="train", overrides=overrides) if not execute: print_config(cfg) return

In response to the config path issue, I have made the following changs :

Moved configs inside package - Configs are now in winnow/configs/ as suggested

Used importlib.resources.files() - Implemented get_config_dir() in winnow/scripts/config_path_utils.py that uses files("winnow").joinpath("configs") for package mode

Added package data - Updated pyproject.toml to include configs in the built package

Switched to initialize_config_dir() - Changed from initialize(config_path=...) to initialize_config_dir(config_dir=...) to handle absolute paths correctly

The solution includes a fallback to dev mode when running from a cloned repo, and also adds support for custom config directories with partial overrides.

Tested and confirmed working when installed from a wheel from my side. Let me know what you think!

Configs are now in winnow/configs/

The winnow/configs folder is not checked in yet, so running winnow train gives:

╭───────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────────────────────────────────────────────╮ │ /home/j-vangoey/code/winnow/winnow/scripts/main.py:352 in train │ │ │ │ 349 │ """Passes control directly to the Hydra training pipeline.""" ╭────────────────────────── locals ──────────────────────────╮ │ │ 350 │ # Capture extra arguments as Hydra overrides (--config-dir already parsed out by Typ │ config_dir = None │ │ │ 351 │ overrides = ctx.args if ctx.args else None │ ctx = <click.core.Context object at 0x76d835d88cd0> │ │ │ ❱ 352 │ train_entry_point(overrides, config_dir=config_dir) │ overrides = None │ │ │ 353 ╰────────────────────────────────────────────────────────────╯ │ │ 354 │ │ 355 @app.command( │ │ │ │ /home/j-vangoey/code/winnow/winnow/scripts/main.py:165 in train_entry_point │ │ │ │ 162 │ from winnow.scripts.config_path_utils import get_primary_config_dir ╭───── locals ──────╮ │ │ 163 │ │ config_dir = None │ │ │ 164 │ # Get primary config directory (custom if provided, otherwise package/dev) │ execute = True │ │ │ ❱ 165 │ primary_config_dir = get_primary_config_dir(config_dir) │ overrides = None │ │ │ 166 │ ╰───────────────────╯ │ │ 167 │ # Initialise Hydra with primary config directory │ │ 168 │ with initialize_config_dir( │ │ │ │ /home/j-vangoey/code/winnow/winnow/scripts/config_path_utils.py:190 in get_primary_config_dir │ │ │ │ 187 │ │ │ f"package: {package_path}) -> {merged_dir}" ╭───────── locals ─────────╮ │ │ 188 │ │ ) │ custom_config_dir = None │ │ │ 189 │ │ return merged_dir ╰──────────────────────────╯ │ │ ❱ 190 │ return get_config_dir().resolve() │ │ 191 │ │ │ │ /home/j-vangoey/code/winnow/winnow/scripts/config_path_utils.py:66 in get_config_dir │ │ │ │ 63 │ if alt_dev_configs.exists() and alt_dev_configs.is_dir(): ╭───────────────────────────────── locals ──────────────────────────────────╮ │ │ 64 │ │ return alt_dev_configs │ alt_dev_configs = PosixPath('/home/j-vangoey/code/winnow/configs') │ │ │ 65 │ │ config_path = PosixPath('/home/j-vangoey/code/winnow/winnow/configs') │ │ │ ❱ 66 │ raise FileNotFoundError( │ dev_configs = PosixPath('/home/j-vangoey/code/winnow/winnow/configs') │ │ │ 67 │ │ f"Could not locate configs directory. Tried:\n" │ repo_root = PosixPath('/home/j-vangoey/code/winnow') │ │ │ 68 │ │ f" - Package configs: winnow.configs\n" │ script_dir = PosixPath('/home/j-vangoey/code/winnow/winnow/scripts') │ │ │ 69 │ │ f" - Dev configs: {dev_configs}\n" ╰───────────────────────────────────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ FileNotFoundError: Could not locate configs directory. Tried: - Package configs: winnow.configs - Dev configs: /home/j-vangoey/code/winnow/winnow/configs - Alt dev configs: /home/j-vangoey/code/winnow/configs

BioGeek · 2025-12-01T11:45:48Z

docs/cli.md

 1. **Model checkpoints** (in `--model-output-folder`):
   - `calibrator.pkl`: Complete trained calibrator with all features and parameters

 2. **Training results** (`--dataset-output-path`):


This still references the old CLI style --dataset-output-path instead of the Hydra style dataset_output_path.

Ah good catch, thanks

chore: pre-commit edits to generate_sample_data

github-actions · 2025-12-04T22:32:48Z

Coverage Report

File	Stmts	Miss	Cover	Missing
__init__.py	0	0	100%
data_types.py	4	0	100%
calibration
__init__.py	0	0	100%
calibration_features.py	265	11	95%	162–163, 326–328, 565–567, 1043–1045
calibrator.py	91	15	83%	69–70, 72, 106–109, 134–135, 137, 162–163, 167, 194–195
datasets
__init__.py	0	0	100%
calibration_dataset.py	86	12	86%	137, 190, 192–193, 199–202, 204–207
data_loaders.py	205	205	0%	7–18, 20–21, 27, 30, 43, 50–51, 62–64, 67, 69, 74, 79, 81–82, 91, 93–96, 98, 102–103, 105, 107, 109–110, 122, 128, 137–138, 142, 144, 160–161, 163–165, 167–169, 171–172, 174, 176, 188, 191–192, 194–195, 200, 206–207, 216, 219, 229, 234, 247–249, 258, 260, 272, 277–279, 281, 285–286, 288–289, 294, 297, 304, 310, 312, 324–325, 328, 331–332, 341, 350, 353, 377, 390, 397–398, 407–408, 410–413, 415, 419–420, 422, 424–425, 434–436, 439–440, 442, 458–459, 461–462, 465–467, 472, 475–476, 479–480, 486–487, 489, 491, 501, 504, 516–517, 524, 530, 534, 536, 539, 543, 545, 561, 573, 586–587, 589–590, 593–595, 605–606, 608, 610–612, 614, 625, 627, 638–639, 643, 647, 649, 661, 668, 681, 691–692, 703, 738, 744, 764, 767, 780, 787, 800–801, 803–805, 811–815, 820–821, 824, 827, 832, 838–844, 850–851
interfaces.py	3	3	0%	6–8
psm_dataset.py	25	0	100%
fdr
__init__.py	0	0	100%
base.py	58	15	74%	81, 85–86, 91, 98–99, 105, 126, 129–130, 135, 137–138, 144, 186
database_grounded.py	25	0	100%
nonparametric.py	25	4	84%	62, 68–69, 72
scripts
__init__.py	0	0	100%
config_formatter.py	53	53	0%	3–5, 8, 16, 27, 29, 31, 37–38, 40–42, 44, 46, 55, 58–60, 62–63, 66–69, 72–74, 77–78, 80, 82, 91, 93, 102, 104, 113, 115, 127–128, 130–132, 134, 145–147, 150, 153–154, 157–158, 160
config_path_utils.py	76	5	93%	24–26, 117–118
main.py	136	136	0%	8, 10–14, 17–21, 24–25, 27–29, 33, 40, 45, 48, 54, 56–57, 60, 69, 77, 80, 87, 89–91, 93, 95–100, 103, 105–106, 111, 126, 129, 135–136, 138–140, 143–144, 147, 160–162, 165, 168, 173, 175–177, 179, 181–182, 185–186, 189, 191–192, 194, 196–197, 200–201, 204–205, 208–209, 212–213, 215, 218, 232–234, 237, 240, 245, 247–249, 251–252, 254–255, 258–259, 262, 264–265, 267, 269–270, 273–274, 280–281, 284–285, 288–289, 292–293, 301–302, 305–308, 312, 315, 338, 351–352, 355, 380, 393–394, 397, 412, 424–425, 428, 443, 455–456
TOTAL	1052	459	56%

Tests	Skipped	Failures	Errors	Time
137	0 💤	0 ❌	0 🔥	36.657s ⏱️

…nstalled as a package chore: fix pre-commit on main script chore: remove testing Make commands fix: correct the path for config_path_utils fix: correct the path for config_path_utils chore: pre-commit formatting fixes for test_config_paths

…s and using config defaults

JemmaLDaniel added 7 commits November 26, 2025 11:41

chore: add hydra to project dependencies

a2ae8d3

feat: use hydra to configure winnow runs

d51a264

test: update tests to use extra init arguments

8d3a02a

feat: add winnow config command to view resolved configuration

5e730c7

docs: document hydra config usage with winnow cli

20ee8b3

docs: make docs titles sentence case and fix bullet list formatting

2529582

perf: optimise CLI startup time with lazy imports

d7e713c

JemmaLDaniel requested a review from BioGeek November 26, 2025 18:18

JemmaLDaniel self-assigned this Nov 26, 2025

JemmaLDaniel added enhancement New feature or request documentation Improvements or additions to documentation labels Nov 26, 2025

JemmaLDaniel added 2 commits November 26, 2025 18:31

chore: merge branch 'main' into feat-hydra-config

07bfc18

chore: update gitignore to ignore extra supported files and images

980a793

BioGeek requested changes Dec 1, 2025

View reviewed changes

JemmaLDaniel added 6 commits December 4, 2025 10:09

Merge branch 'main' into feat-hydra-config

b18bd54

fix: convert predictions_path to a Path before file loading

bb25d28

docs: add instructions on conversion from mgf to parquet file

a883fcd

docs: remove references to old Typer CLI arguments

e9126d9

feat: create toy data for CLI quickstart

864095e

chore: pre-commit edits to generate_sample_data

docs: add documentation for quickstarting with the toy data

d614fcf

JemmaLDaniel force-pushed the feat-hydra-config branch 4 times, most recently from f0bafe0 to b459d60 Compare December 4, 2025 18:59

JemmaLDaniel requested a review from BioGeek December 4, 2025 19:03

JemmaLDaniel added 4 commits December 8, 2025 12:20

chore: update example notebook with new object instantiation argument…

ad8b1f5

…s and using config defaults

ci: migrate coverage badge to Gist-based dynamic system

00a006b

chore: track new config position

571b3b3

JemmaLDaniel force-pushed the feat-hydra-config branch from b1f5a96 to 571b3b3 Compare December 8, 2025 12:25


		Winnow supports multiple input formats:

		- InstaNovo: Parquet spectra + CSV predictions (beam search format)

Implement Typer + Hydra Configuration Architecture #147

Are you sure you want to change the base?

Implement Typer + Hydra Configuration Architecture #147

Uh oh!

Conversation

JemmaLDaniel commented Nov 26, 2025

Summary

Implementation Details

1. Typer + Hydra Hybrid Architecture

2. Structured Configuration with Composition

3. Hydra-Based Object Instantiation

4. Configuration Inspection Commands

5. Lazy Imports for CLI Performance

6. Documentation Updates

Migration Notes

Uh oh!

JemmaLDaniel commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JemmaLDaniel Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JemmaLDaniel commented Nov 26, 2025 •

edited

Loading

JemmaLDaniel Dec 4, 2025 •

edited

Loading

github-actions bot commented Dec 4, 2025 •

edited

Loading