Skip to content
This repository was archived by the owner on Apr 15, 2024. It is now read-only.

Conversation

@tataganesh
Copy link

@tataganesh tataganesh commented Aug 22, 2017

Sometimes, when the _load_data function is called in cmapdb.py, the following error is invoked -

File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 832, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 843, in render_contents
    self.init_resources(resources)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 347, in init_resources
    self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 195, in get_font
    font = self.get_font(None, subspec)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 186, in get_font
    font = PDFCIDFont(self, spec)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 668, in __init__
    self.unicode_map = CMapDB.get_unicode_map(self.cidcoding, self.cmap.is_vertical())
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/cmapdb.py", line 283, in get_unicode_map
    data = klass._load_data('to-unicode-%s' % name)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/cmapdb.py", line 253, in _load_data
    if os.path.exists(path):
  File "/home/ganesh/.virtualenvs/cv/lib/python2.7/genericpath.py", line 26, in exists
    os.stat(path)
TypeError: stat() argument 1 must be encoded string without null bytes, not str

The snippet in question -

    def _load_data(klass, name):
        filename = '%s.pickle.gz' % name
        logging.info('loading: %r' % name)
        cmap_paths = (os.environ.get('CMAP_PATH', '/usr/share/pdfminer/'),
                      os.path.join(os.path.dirname(__file__), 'cmap'),)
        for directory in cmap_paths:
            path = os.path.join(directory, filename)
            if os.path.exists(path):

Printing the variable name gives me -
to-unicode-PDFXC30-Identity
Printing repr(name) gives me -
to-unicode-PDFXC30-Identity\x00\x00
Apparently, these \x00 characters are causing the issue. One fix that solved this issue for me was -
name = name.replace('\0', '')

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant