Skip to content
This repository was archived by the owner on Aug 16, 2025. It is now read-only.

Preprocessing of discrete variables #1

@janfrancu

Description

@janfrancu

I was digging around in the preprocessing code and found, that there are two heuristics for identifying discrete variables. One checking for if all nonmissing values are integers L199 and the other one counting unique values L200. Which one did you use in the experiment and also maybe why do you prefer one over the other?

One other thing I noticed is, that casting of binary variables into 0,1 values is applied to the whole dataset, not just the particular column L212 , fortunately the heterogeneous data from ABDA paper should have the binaries removed. However this is not the case with Anneal-U dataset, which preprocesses the 9th feature as categorical instead of binary, as I believe that the authors probably used the value range metadata. As a result there is some discrepancy caused by the different "detection" of discrete variables, but I cannot judge if it has some negative effects on the results in that particular dataset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions