-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Description
Hi, I understand that this may look llike an expected behaviour but this can lead to unexpected results in the following scenario:
- arff file with quoted question marks as categorical values and data: e.g.
@attribute feat1 {'?', 'A', 'B', 'C'} arff.load()reads those'?'as strings.arff.write()(for example after sampling the original data) then writes the'?'from loaded data without quotes:@attribute feat1 {?, A, B, C}arff.load()the last file interpretes?as missing value (None).
see openml/automlbenchmark#209 for a hack implemented locally to prevent this, but this hack also means that it would not be possible anymore to represent missing values as ? in arff files saved with the library.
Suugesting to add a param to arff.dump signature, for example:
def dump(obj, fp, missing_values=[None, '?']):
pass
allowing user to call arff.dump(o, f, missing_values=[None]) when ? should not be considered as a missing value, and therefore be quoted.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels