Skip to content

Unicode Key Causes Encoding Error with Log Statement #10

@jacinda

Description

@jacinda

I noticed this while using qr (which is great, btw) with Django, which uses unicode for everything and I ended up using something like q = Queue(u'my_key') without realizing it at first because my_key was a variable and not a string I had hard-coded. It also only broke if the value being popped met got pickled with non-ascii characters.

This error occurs because of the combination of using a cPickle protocol of 1 with a unicode string. There are a couple of solutions to the bug. Let me know which you prefer and I'll submit a patch.

Here is a detailed description.

Because of the way _pack is defined using protocol 1, cPickle uses a binary format for serialization:

def _pack(self, val):
    """Prepares a message to go into Redis"""
    return self.serializer.dumps(val, 1)

When a log statement is then executed on popping, if the string used for key lookup is unicode, a UnicodeDecodeError will be raised if the value of popped containing any hex values greater than 127.

log.debug('Popped ** %s ** from key ** %s **' % (popped, self.key))

Here is an example:

>>> import cPickle
>>> x = cPickle.dumps(128, 1)
>>> x
'K\x81.'
>>> u = u'unicode string'
>>> 'Popped ** %s ** from key ** %s **' % (x, u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 11: ordinal not in range(128)

This does not fail if protocol 0 is used:

>>> x = cPickle.dumps(128)
>>> 'Popped ** %s ** from key ** %s **' % (x, u)
u'Popped ** I129\n. ** from key ** unicode string **'

It also does not fail if the unicode string is specifically encoded as ascii:

>>> x = cPickle.dumps(128, 1)
>>> 'Popped ** %s ** from key ** %s **' % (x, u.encode('ascii'))
'Popped ** K\x80. ** from key ** unicode string **'

Either changing the pickling protocol or using explicit encoding are options and I can submit either as a patch (or do something else you suggest if both of these are considered less than ideal). Let me know what the preferred solution is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions