-
Notifications
You must be signed in to change notification settings - Fork 1
Implement Typer + Hydra Configuration Architecture #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| winnow train data_loader=mztab model_output_dir=models/my_model | ||
|
|
||
| # Specify dataset paths | ||
| winnow train dataset.spectrum_path_or_directory=data/spectra.parquet dataset.predictions_path=data/preds.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try this:
winnow train dataset.spectrum_path_or_directory=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/parquet/mouse/dataset-mus-musculus-train-0000-0001.parquet dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv
I get:
[...]
ConfigCompositionException: Could not override 'dataset.predictions'.
To append to your config use +dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv
When I try that:
winnow train dataset.spectrum_path_or_directory=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/parquet/mouse/dataset-mus-musculus-train-0000-0001.parquet +dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv
I get:
[...]
│ /home/j-vangoey/code/winnow/winnow/datasets/data_loaders.py:62 in _load_beam_preds │
│ │
│ 59 │ │ Returns: ╭───────────────── locals ──────────────────╮ │
│ 60 │ │ │ Tuple[pl.DataFrame, pl.DataFrame]: A tuple containing the predictions and be │ predictions_path = 'data/predictions.csv' │ │
│ 61 │ │ """ ╰───────────────────────────────────────────╯ │
│ ❱ 62 │ │ if predictions_path.suffix != ".csv": │
│ 63 │ │ │ raise ValueError( │
│ 64 │ │ │ │ f"Unsupported file format for InstaNovo beam predictions: {predictions_p │
│ 65 │ │ │ ) │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'str' object has no attribute 'suffix'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be dataset.predictions_path, not dataset.predictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see predictions_path was coming in from the config as a string, so I'll convert to a Path before file loading.
| --output-folder ./predictions | ||
| ```bash | ||
| # Change MLP architecture | ||
| winnow train calibrator.hidden_layer_sizes=[100,50,25] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try this
winnow train calibrator.hidden_layer_sizes=[100,50,25]
I get:
zsh: no matches found: calibrator.hidden_layer_sizes=[100,50,25]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, strange. This works fine for me
|
|
||
| ### InstaNovo Configuration | ||
| # Specify dataset paths | ||
| winnow predict dataset.spectrum_path_or_directory=data/spectra.parquet dataset.predictions_path=data/preds.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as earlier. When I try this:
winnow predict dataset.spectrum_path_or_directory=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/parquet/mouse/dataset-mus-musculus-train-0000-0001.parquet +dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv
I get:
[ ...]
ConfigCompositionException: Could not override 'dataset.predictions'.
To append to your config use +dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv
and when I try that:
winnow predict dataset.spectrum_path_or_directory=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/parquet/mouse/dataset-mus-musculus-train-0000-0001.parquet +dataset.predictions=/home/j-vangoey/code/InstaNovo-internal/data/nine-species-balanced/instanovo_af3456d3_9to1/mouse.csv
I get
[...]
AttributeError: 'str' object has no attribute 'suffix'
docs/cli.md
Outdated
|
|
||
| Winnow supports multiple input formats: | ||
|
|
||
| - **InstaNovo**: Parquet spectra + CSV predictions (beam search format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most people will have their input data in *.MGF so I think it would be good to either point to instructions on how to use instanovo convert to convert *.MGF to *.parquet or to add functionality to do that in winnow on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point! I will address mgf file loading in a new PR, and I can add a bit on this in the docs as a patch for now
| # Predict using pretrained model, InstaNovo predictions and default settings | ||
| winnow predict \ | ||
| dataset.spectrum_path_or_directory=data/test_spectra.parquet \ | ||
| dataset.predictions_path=data/test_predictions.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add small sample test_spectra.parquet and test_predictions.csv files to the repo (or add them to a new relase as assets and add a file to download them) so that people quickly have some sample files to play around with.
|
|
||
| ```bash | ||
| # Train with default settings | ||
| winnow train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Training with default settings gives me:
[...]
FileNotFoundError: No such file or directory (os error 2): data/spectra.ipc
| winnow train | ||
|
|
||
| # Predict with default settings | ||
| winnow predict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
FileNotFoundError: No such file or directory (os error 2): data/spectra.ipc
winnow/scripts/main.py
Outdated
| from hydra.utils import instantiate | ||
|
|
||
| with initialize( | ||
| config_path="../../config", version_base="1.3", job_name="winnow_train" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config_path="../../config" won't work when we distribute the package via PyPI. To confirm:
- Build the
winnow-fdrpackage.
$ uv build
Building source distribution...
[...]
Successfully built dist/winnow_fdr-1.0.3.tar.gz
Successfully built dist/winnow_fdr-1.0.3-py3-none-any.whl
- Install this wheel
$ cd /tmp
$ uv init winnow_demo
Initialized project `winnow-demo` at `/tmp/winnow_demo`
$ cd winnow_demo
$ uv add ~/code/winnow/dist/winnow_fdr-1.0.3-py3-none-any.whl
Using CPython 3.13.6
Creating virtual environment at: .venv
Resolved 167 packages in 1.94s
Prepared 23 packages in 2m 23s
[...]
$ source .venv/bin/activate
$ winnow config train
[...]
MissingConfigException: Primary config directory not found.
Check that the config directory '/tmp/winnow_demo/.venv/lib/python3.13/site-packages/config' exists
and readable
We have had the same problem in InstaNovo. The solution is to move your config folder inside the winnow folder
winnow/
config/
data_loader
instanovo.yaml
[...]
and the use importlib:
from importlib.resources import files
def train_entry_point(overrides=None, execute=True):
from hydra import initialize, compose
from hydra.utils import instantiate
from hydra.core.global_hydra import GlobalHydra
# Reset Hydra if called multiple times in same process
GlobalHydra.instance().clear()
# Resolve config directory inside package
config_dir = files("winnow").joinpath("config")
with initialize(
config_path=str(config_dir),
version_base="1.3",
job_name="winnow_train",
):
cfg = compose(config_name="train", overrides=overrides)
if not execute:
print_config(cfg)
returnThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In response to the config path issue, I have made the following changs :
- Moved configs inside package - Configs are now in
winnow/configs/as suggested - Used
importlib.resources.files()- Implementedget_config_dir()inwinnow/scripts/config_path_utils.pythat usesfiles("winnow").joinpath("configs")for package mode - Added package data - Updated
pyproject.tomlto include configs in the built package - Switched to
initialize_config_dir()- Changed frominitialize(config_path=...)toinitialize_config_dir(config_dir=...)to handle absolute paths correctly
The solution includes a fallback to dev mode when running from a cloned repo, and also adds support for custom config directories with partial overrides.
Tested and confirmed working when installed from a wheel from my side. Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configs are now in
winnow/configs/
The winnow/configs folder is not checked in yet, so running winnow train gives:
╭───────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────────────────────────────────────────────╮
│ /home/j-vangoey/code/winnow/winnow/scripts/main.py:352 in train │
│ │
│ 349 │ """Passes control directly to the Hydra training pipeline.""" ╭────────────────────────── locals ──────────────────────────╮ │
│ 350 │ # Capture extra arguments as Hydra overrides (--config-dir already parsed out by Typ │ config_dir = None │ │
│ 351 │ overrides = ctx.args if ctx.args else None │ ctx = <click.core.Context object at 0x76d835d88cd0> │ │
│ ❱ 352 │ train_entry_point(overrides, config_dir=config_dir) │ overrides = None │ │
│ 353 ╰────────────────────────────────────────────────────────────╯ │
│ 354 │
│ 355 @app.command( │
│ │
│ /home/j-vangoey/code/winnow/winnow/scripts/main.py:165 in train_entry_point │
│ │
│ 162 │ from winnow.scripts.config_path_utils import get_primary_config_dir ╭───── locals ──────╮ │
│ 163 │ │ config_dir = None │ │
│ 164 │ # Get primary config directory (custom if provided, otherwise package/dev) │ execute = True │ │
│ ❱ 165 │ primary_config_dir = get_primary_config_dir(config_dir) │ overrides = None │ │
│ 166 │ ╰───────────────────╯ │
│ 167 │ # Initialise Hydra with primary config directory │
│ 168 │ with initialize_config_dir( │
│ │
│ /home/j-vangoey/code/winnow/winnow/scripts/config_path_utils.py:190 in get_primary_config_dir │
│ │
│ 187 │ │ │ f"package: {package_path}) -> {merged_dir}" ╭───────── locals ─────────╮ │
│ 188 │ │ ) │ custom_config_dir = None │ │
│ 189 │ │ return merged_dir ╰──────────────────────────╯ │
│ ❱ 190 │ return get_config_dir().resolve() │
│ 191 │
│ │
│ /home/j-vangoey/code/winnow/winnow/scripts/config_path_utils.py:66 in get_config_dir │
│ │
│ 63 │ if alt_dev_configs.exists() and alt_dev_configs.is_dir(): ╭───────────────────────────────── locals ──────────────────────────────────╮ │
│ 64 │ │ return alt_dev_configs │ alt_dev_configs = PosixPath('/home/j-vangoey/code/winnow/configs') │ │
│ 65 │ │ config_path = PosixPath('/home/j-vangoey/code/winnow/winnow/configs') │ │
│ ❱ 66 │ raise FileNotFoundError( │ dev_configs = PosixPath('/home/j-vangoey/code/winnow/winnow/configs') │ │
│ 67 │ │ f"Could not locate configs directory. Tried:\n" │ repo_root = PosixPath('/home/j-vangoey/code/winnow') │ │
│ 68 │ │ f" - Package configs: winnow.configs\n" │ script_dir = PosixPath('/home/j-vangoey/code/winnow/winnow/scripts') │ │
│ 69 │ │ f" - Dev configs: {dev_configs}\n" ╰───────────────────────────────────────────────────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: Could not locate configs directory. Tried:
- Package configs: winnow.configs
- Dev configs: /home/j-vangoey/code/winnow/winnow/configs
- Alt dev configs: /home/j-vangoey/code/winnow/configs
docs/cli.md
Outdated
| 1. **Model checkpoints** (in `--model-output-folder`): | ||
| - `calibrator.pkl`: Complete trained calibrator with all features and parameters | ||
|
|
||
| 2. **Training results** (`--dataset-output-path`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still references the old CLI style --dataset-output-path instead of the Hydra style dataset_output_path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch, thanks
chore: pre-commit edits to generate_sample_data
f0bafe0 to
b459d60
Compare
…nstalled as a package chore: fix pre-commit on main script chore: remove testing Make commands fix: correct the path for config_path_utils fix: correct the path for config_path_utils chore: pre-commit formatting fixes for test_config_paths
…s and using config defaults
b1f5a96 to
571b3b3
Compare
Summary
This PR implements the Typer + Hydra hybrid architecture proposed in #146, refactoring Winnow's configuration management from flat CLI signatures to a flexible, hierarchical system that enables scalable configuration of complex nested components and automatic object instantiation.
Implementation Details
1. Typer + Hydra Hybrid Architecture
Typer now acts as a thin command dispatcher, passing all configuration to Hydra:
2. Structured Configuration with Composition
Created modular configuration structure in
config/:train.yaml/predict.yaml- Main pipeline configurationscalibrator.yaml- Model architecture and featuresresidues.yaml- Amino acid masses and modifications (shared via composition)data_loader/- Pluggable dataset format loaders (InstaNovo, MZTab, PointNovo, Winnow)fdr_method/- Pluggable FDR methods (nonparametric, database-grounded)Configuration files use Hydra's
defaultsmechanism to compose shared components.3. Hydra-Based Object Instantiation
Used Hydra's
_target_field for automatic instantiation:if/eliflogic_target_pointing to their classes4. Configuration Inspection Commands
Added
winnow configcommand group:winnow config train- Display resolved training configurationwinnow config predict- Display resolved prediction configurationImplemented custom
ConfigFormatterclass with hierarchical colour-coding based on YAML nesting depth for improved terminal readability.5. Lazy Imports for CLI Performance
Implemented lazy import pattern using
TYPE_CHECKINGto defer heavy dependencies (PyTorch, InstaNovo, etc.) until command execution. This makes--helpandconfigcommands respond instantly whilst pipeline commands still have access to all required dependencies.Added module-level docstring in
main.pyexplaining the rationale.6. Documentation Updates
Minor improvements to CLI help text and documentation to reflect the new Hydra-based configuration system with examples of dot-notation overrides.
Migration Notes
Existing users will need to:
config/instead of passing all parameters via CLI flagswinnow train calibrator.seed=42winnow config <pipeline>to inspect resolved configurations