Description:
Right now, different subfolders load and process datasets in inconsistent ways. Create a unified, reusable Python module to handle dataset loading and basic preprocessing (e.g., scaling, splitting). This ensures maintainability and reduces repeated code across ML scripts.
Expected Tasks:
- Create a Python utility (e.g.,
data_utils.py) with functions like:
load_csv(path)
train_test_split(X, y, test_size=0.2)
standardize(X)
- Save this file in a
utils/ folder.
- Refactor at least two existing implementations (e.g., KNN and Perceptron) to use this utility.
- Update documentation in the root
README.md and/or affected folders.
Stretch Goal:
- Add optional support for downloading public datasets (e.g., from UCI or sklearn).
Description:
Right now, different subfolders load and process datasets in inconsistent ways. Create a unified, reusable Python module to handle dataset loading and basic preprocessing (e.g., scaling, splitting). This ensures maintainability and reduces repeated code across ML scripts.
Expected Tasks:
data_utils.py) with functions like:load_csv(path)train_test_split(X, y, test_size=0.2)standardize(X)utils/folder.README.mdand/or affected folders.Stretch Goal: