-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Description
If a categorical attribute has only numeric values (which is valid, if not somewhat unspecified, in the original arff definition), the package raises an error when writing the data to disk:
import arff
data = dict(
relation='dataset name',
description='dataset description',
attributes=[("categorical_with_numeric_values", [1, 2, 3])],
data=[[1], [2], [3]]
)
with open("test.arff", "w") as fh:
arff.dump(data, fh)Expected behavior: Produce a valid arff file with attribute:
@ATTRIBUTE categorical_with_numeric_values {1, 2, 3}.
Actual behavior: Treats the categories as strings, leading to an error:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/Users/pietergijsbers/repositories/automlbenchmark/venv/lib/python3.9/site-packages/arff.py", line 1091, in dump
for row in generator:
File "/Users/pietergijsbers/repositories/automlbenchmark/venv/lib/python3.9/site-packages/arff.py", line 1028, in iter_encode
yield self._encode_attribute(attr[0], attr[1])
File "/Users/pietergijsbers/repositories/automlbenchmark/venv/lib/python3.9/site-packages/arff.py", line 964, in _encode_attribute
type_tmp = [u'%s' % encode_string(type_k) for type_k in type_]
File "/Users/pietergijsbers/repositories/automlbenchmark/venv/lib/python3.9/site-packages/arff.py", line 964, in <listcomp>
type_tmp = [u'%s' % encode_string(type_k) for type_k in type_]
File "/Users/pietergijsbers/repositories/automlbenchmark/venv/lib/python3.9/site-packages/arff.py", line 420, in encode_string
if _RE_QUOTE_CHARS.search(s):
TypeError: expected string or bytes-like object
Possible workaround by stringifying the categories (they are unquoted in the resulting arff header):
- attributes=[("categorical_with_numeric_values", [1, 2, 3])],
+ attributes=[("categorical_with_numeric_values", ['1', '2,' '3'])],Python 3.11.3, liac-arff 2.5.0
I understand there is currently no work being done on the package, but I figured I would document the bug and workaround.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels