Skip to content

Conversation

@arimtannn
Copy link

Summary

Lock baseline pipeline and defaults for v1.0.0.

Changes

  • Set best Random Forest hyperparams in config.yaml
  • Wire steps in main.py: basic_cleaning → data_check → data_split → train_random_forest
  • Add/complete tests in src/data_check/test_data.py
  • Finish training logic in src/train_random_forest/run.py (fit, save, log metrics)

Metrics (validation)

  • MAE: 33.84692
  • R²: 0.56193

How to run

mlflow run . -P steps=download,basic_cleaning,data_check,data_split,train_random_forest

Notes

  • Test step (test_regression_model) is manual and requires promoting random_forest_export to prod in W&B.
  • Artifacts and local tracking dirs are ignored via .gitignore.

@arimtannn arimtannn closed this Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant