Skip to content

Standardize Dataset Handling Across All ML Modules #11

@HashSlap

Description

@HashSlap

Description:
Right now, different subfolders load and process datasets in inconsistent ways. Create a unified, reusable Python module to handle dataset loading and basic preprocessing (e.g., scaling, splitting). This ensures maintainability and reduces repeated code across ML scripts.

Expected Tasks:

  • Create a Python utility (e.g., data_utils.py) with functions like:
    • load_csv(path)
    • train_test_split(X, y, test_size=0.2)
    • standardize(X)
  • Save this file in a utils/ folder.
  • Refactor at least two existing implementations (e.g., KNN and Perceptron) to use this utility.
  • Update documentation in the root README.md and/or affected folders.

Stretch Goal:

  • Add optional support for downloading public datasets (e.g., from UCI or sklearn).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions