Refactor Plan for `Running-Optimizer`

1. Executive Summary

The repository currently suffers from "script-heavy" architecture where core business logic (training loops, feature orchestration) is embedded in scripts/ or app.py rather than reusable library modules. This leads to code duplication, testing difficulties, and fragile import hacks (e.g., sys.argv manipulation).

This refactor will move all reusable logic into src/, treating scripts/ strictly as CLI entry points. It will also standardize path handling and configuration.

2. Current Architecture & Smells

Path: Data -> Features -> Train -> Eval -> Predict

Data: data/raw/ -> scripts/ingest_local_csv.py (implied) or app.py loading directly.
Features: scripts/make_features.py calls src.features but also contains filtering logic. Output: data/processed/features_*.csv.
Train: scripts/train_model.py defines pipelines, performs grid search, and saves to models/.
Predict/Eval: scripts/predict.py and scripts/evaluate_model.py.

Architectural Smells

Fat Scripts: scripts/train_model.py contains ~200 lines of model definitions, cross-validation loops, and plotting logic.
Duplicated Logic: app.py reimplements run filtering/cleaning logic found partly in src/features.py.
Import Hacks: src/pipeline.py mocks sys.argv to call scripts.make_features.main.
Hardcoded Paths: REPO_ROOT is redefined in almost every file.
Hidden Dependencies: src/models.py hardcodes paths to data/processed/.

3. Target Architecture

Running-Optimizer/
├── archive/                 # Deprecated scripts/modules
├── configs/                 # YAML configs
├── data/                    # Data artifacts (ignored by git)
├── scripts/                 # Thin CLI wrappers
│   ├── make_features.py
│   ├── train_model.py
│   ├── evaluate_model.py
│   └── predict.py
├── src/
│   ├── __init__.py
│   ├── config.py            # Centralized config & path definitions
│   ├── data/
│   │   ├── __init__.py
│   │   ├── io.py            # Loaders/Savers (abstract CSV/Parquet paths)
│   │   └── clean.py         # Domain cleaning (moving time, pause ratio)
│   ├── features/
│   │   ├── __init__.py
│   │   ├── generator.py     # Orchestration (was make_features.py)
│   │   └── transformations.py # Core math (rolling windows, etc.)
│   ├── models/
│   │   ├── __init__.py
│   │   ├── registry.py      # Model pipeline definitions
│   │   ├── training.py      # CV loops, Grid Search logic
│   │   └── evaluation.py    # Metrics, plots
│   └── visualization/       # Plotting helpers
├── tests/
└── app.py                   # Streamlit app (imports from src)

4. Migration Plan

Phase 1: Foundation (Paths & Config)

Goal: Remove REPO_ROOT duplication and centralize constants.

Action: Create src/config.py defining REPO_ROOT, DATA_DIR, MODELS_DIR.
Refactor: Update src/utils.py and others to import paths from src/config.py.

Phase 2: Data & Features

Goal: Decouple feature generation from the script.

Action: Move app.py cleaning logic to src/data/clean.py (function: clean_raw_runs).
Action: Move scripts/make_features.py logic to src/features/generator.py (function: generate_features_dataset).
Update: scripts/make_features.py becomes a 10-line wrapper.

Phase 3: Training & Models

Goal: Make training testable and importable.

Action: Move model definitions (Ridge, RF, pipelines) from scripts/train_model.py to src/models/registry.py.
Action: Move the CV/GridSearch loop to src/models/training.py (function: run_training_job).
Update: scripts/train_model.py becomes a wrapper calling run_training_job.

Phase 4: App Integration

Goal: Ensure App uses the same logic as the pipeline.

Action: Update app.py to import clean_raw_runs from src/data/clean.py.

Phase 5: Cleanup

Action: Move scripts/train_baseline.py and scripts/convert_strava_activities.py to archive/ if unused.
Action: Remove src/pipeline.py (replaced by Makefile or simple script chaining).

5. Golden Path (Runbook)

1. Install

make install
source venv/bin/activate

2. Generate Features

# Uses src/features/generator.py
python scripts/make_features.py --dataset dhruva --inp data/raw/runs.csv

3. Train Model

# Uses src/models/training.py
python scripts/train_model.py --name dhruva --table 5k

4. Predict/Eval

python scripts/evaluate_model.py --name dhruva --split test

6. Risks & Mitigations

Risk: app.py breakage.
- Mitigation: Run streamlit run app.py locally after Phase 2 and Phase 4.
Risk: Circular imports (e.g., models importing features importing config).
- Mitigation: Keep config.py dependency-free. strict hierarchy: models -> features -> data -> config.
Risk: Path resolution in Streamlit vs CLI.
- Mitigation: Use pathlib relative to __file__ in src/config.py to robustly find the repo root.

7. Verification Checklist

pytest passes (existing tests).
python scripts/make_features.py ... produces identical output to before.
python scripts/train_model.py ... runs without error and saves models.
streamlit run app.py loads successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Plan for `Running-Optimizer`

1. Executive Summary

2. Current Architecture & Smells

Path: Data -> Features -> Train -> Eval -> Predict

Architectural Smells

3. Target Architecture

4. Migration Plan

Phase 1: Foundation (Paths & Config)

Phase 2: Data & Features

Phase 3: Training & Models

Phase 4: App Integration

Phase 5: Cleanup

5. Golden Path (Runbook)

6. Risks & Mitigations

7. Verification Checklist

FilesExpand file tree

refactor_plan.md

Latest commit

History

refactor_plan.md

File metadata and controls

Refactor Plan for Running-Optimizer

1. Executive Summary

2. Current Architecture & Smells

Path: Data -> Features -> Train -> Eval -> Predict

Architectural Smells

3. Target Architecture

4. Migration Plan

Phase 1: Foundation (Paths & Config)

Phase 2: Data & Features

Phase 3: Training & Models

Phase 4: App Integration

Phase 5: Cleanup

5. Golden Path (Runbook)

6. Risks & Mitigations

7. Verification Checklist

Refactor Plan for `Running-Optimizer`