Skip to content

Conversation

@fserb
Copy link

@fserb fserb commented Dec 7, 2010

Hey,
I was trying to use GTFS to parse some utf-8 data, and it was failing with weird UnicodeEncodeError.
I traced this down to two factors:

  1. unmapped_entities.py was converting string attributes to str() (thus trying to convert all unicode to 'ascii').
  2. csv.reader doesn't handle unicode very well.

My first commit changes the test data to have one entry on Stops that has utf-8 characters, hence breaking the tests.

My second commit fixes both issues and makes the tests pass again:
to fix 1, I've made a special case for str on umapped_entities to convert to unicode() instead of str().
to fix 2, I've created a unicode_csv_reader function that wraps around csv.reader/codes.iterdecode. The steps here are a bit annoying: iterdecode() from utf-8, encode it back, so csv.reader is fine with it, get the output from csv.reader and decode it back to utf-8, so we have the final utf-8 output.

thanks for attention,
[]s
F.

@Lawouach
Copy link

Any chance this is fixed at some point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants