Feat: Migrate Winnow's argument management to a Typer + Hydra yaml architecture

# Feature Request: Scalable Configuration Management and Improved CLI Reporting

## Summary

This issue proposes migrating Winnow's configuration management from its current flat Typer structure to a robust, hierarchical Typer + Hydra architecture. This is refactoring step is necessary to enable built-in experiment management (hyperparameter tuning) and scalable configuration of complex, nested components, while introducing a powerful, self-documenting CLI command (`winnow config show`).

## 1. Current Limitations (The Problem)

The current project structure relies on two inflexible methods for defining critical values: flat Python function signatures and hard-coded global variables/dictionaries.

### Current Typer Command Signatures

```
# predict command:
def predict(
    data_source: Annotated[...], 
    dataset_config_path: Annotated[...], # Path to config file, but params aren't directly configurable
    method: Annotated[...], 
    fdr_threshold: Annotated[...],
    confidence_column: Annotated[...],
    # ... 
):
```
```
# train command:
def train(
    data_source: Annotated[...], 
    dataset_config_path: Annotated[...], # Same issue here
    model_output_dir: Annotated[...],
    dataset_output_path: Annotated[...],
    learn_prosit_missing: Annotated[bool, ...], # Flat boolean flag
    learn_chimeric_missing: Annotated[bool, ...], # Another flat boolean flag
    # ...
):
```

This structure leads to significant scalability and maintenance limitations:

- **Rigid, Flat Structure:** Every new configuration option must be added as a separate, mandatory, top-level argument to the function signature, making the command line cumbersome and difficult to read.

- **Difficulty with Nested Configuration:** It is difficult to configure nested objects or pass complex structures like a dictionary of column renames, forcing the use of separate, poorly integrated config files (`dataset_config_path`) alongside CLI arguments.

- **Hard-Coded Constants and Global Variables:** Shared values like `RESIDUE_MASSES` (from `winnow.constants.py`) and simple hyperparameters like `SEED` and `MZ_TOLERANCE` (from `main.py` globals) are currently hard-coded in Python files. This prevents:
  - CLI Overrides: Users cannot easily override these values (e.g., change `MZ_TOLERANCE`) via the command line.
  - Composition: We cannot easily swap out large datasets (e.g., a different set of residue masses) via Hydra configuration files.
  - Documentation: These values are not documented or validated by the configuration system.

- **Lack of Experiment Management:** The current system cannot support built-in hyperparameter tuning via multirun or automatically log the exact configuration used for any given run, which is useful for reproducible scientific computing.

- **Poor Argument Discovery:** Users lack a single, reliable command to view all configurable parameters, their types, defaults and descriptions.

## 2. Proposed Solution: Typer + Hydra Hybrid Architecture

We will adopt Hydra as the single source of truth for all configuration data, responsible for parameter defaulting, overriding and object instantiation. Typer will be delegated to act as a thin command dispatcher for `winnow train`, `winnow predict` and the new reporting command.

### A. Architectural Components

1. **Configuration Composition:** We will split configuration into small, logical YAML files. The main pipeline files will include the constants file via Hydra's `defaults` keyword, ensuring constants like `RESIDUE_MASSES` are defined only once but are available everywhere in the configuration object.

2. **Hydra Instantiation and Extensibility:** Hydra will use the `_target_` field in the configuration to automatically instantiate complex objects (like the `ProbabilityCalibrator` and its base features) based on configuration, eliminating manual initialisation logic in our Python code.

   - Extending Feature Sets: If a power user creates a new feature class (e.g., `NewCalibrationFeature`), they simply define a new YAML file that points the `_target_` field to that class (`your_module.calibration_features.NewCalibrationFeature`). They can then inject this new feature into the calibrator list via a simple command-line override, without changing any core Python code.
   
   - Extending Data Loading: The old `--data-source` flag, which required manual `if/elif` logic, is replaced by configuration. A user implements a new `NewDataLoader` class and defines its parameters in a YAML file using `_target_`. They can then swap the data source for a run: `winnow predict data_source=new_source`.

3. **Typer Dispatch:** The `train` and `predict` Typer functions will be stripped of their configuration arguments and configured to pass all command-line input directly to Hydra for processing.

### B. New Feature: Configuration Inspection

We will introduce a new command for viewing the resolved configuration. This command will display the fully resolved configuration for the specified pipeline (e.g., `train` or `predict`), showing all parameter values after defaults, composition and any command-line overrides have been applied.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Migrate Winnow's argument management to a Typer + Hydra yaml architecture #146

Feature Request: Scalable Configuration Management and Improved CLI Reporting

Summary

1. Current Limitations (The Problem)

Current Typer Command Signatures

2. Proposed Solution: Typer + Hydra Hybrid Architecture

A. Architectural Components

B. New Feature: Configuration Inspection

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feat: Migrate Winnow's argument management to a Typer + Hydra yaml architecture #146

Description

Feature Request: Scalable Configuration Management and Improved CLI Reporting

Summary

1. Current Limitations (The Problem)

Current Typer Command Signatures

2. Proposed Solution: Typer + Hydra Hybrid Architecture

A. Architectural Components

B. New Feature: Configuration Inspection

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions