Skip to content

messytables guesses wrong type for decimal number #190

@wrinklenose

Description

@wrinklenose

Describe the bug
Messytables should guess decimals correctly respecting the locale configuration.
For example: In germany the , is used as decimal dot but a value 1,200 is guessed as type "text".

This issue was initially reported as ckan issue ckan/ckan#5769 where I recognized it.

The type guessing seems to happen here: https://github.com/okfn/messytables/blob/51b736892a48e420ab313675f54901c77b446dec/messytables/types.py
and seems to happen locale specific. (I think the magic happens in line 100:
value = locale.atof(value)

Unfortunately python seems to recognizes a dot as decimal point even if a german locale is set, which I could reproduce in my local environment:

>>> locale.getlocale()
('de_DE', 'cp1252')
>>> locale.atof('1,200')

Traceback (most recent call last):
  File "<pyshell#35>", line 1, in <module>
    locale.atof('1,200')
  File "C:\Program Files\Python27\lib\locale.py", line 318, in atof
    return func(string)
ValueError: invalid literal for float(): 1,200
>>> locale.localeconv()
{'mon_decimal_point': '', 'int_frac_digits': 127, 'p_sep_by_space': 127, 'frac_digits': 127, 'thousands_sep': '', 'n_sign_posn': 127, 'decimal_point': '.', 'int_curr_symbol': '', 'n_cs_precedes': 127, 'p_sign_posn': 127, 'mon_thousands_sep': '', 'negative_sign': '', 'currency_symbol': '', 'n_sep_by_space': 127, 'mon_grouping': [], 'p_cs_precedes': 127, 'positive_sign': '', 'grouping': []}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions