Skip to content

deleted letters from PDF even if those letters were present in the source document  #27

@jakubsiast

Description

@jakubsiast

I used pdf-redactor to change some text in a pdf file, but in part of the pdf I've lost all the 'n' characters.
The affected text was not the one that I hoped to change. The text was handled in the "class TextToken" by the "str(self)" function as an unchanged text, i.e., it passes through condition "if self.value == self.original_value:". Nevertheless it has changed. What I managed to do is to track that the function to blame is "PdfString.from_bytes(...)" in line 379 of pdf_redactor.py:
# If unchanged, return the raw original value without decoding/encoding.
return PdfString.from_bytes(self.raw_original_value)
By forcing the encoding of the unchanged TextToken to 'hex' I managed to fix the issue:
return PdfString.from_bytes(self.raw_original_value, bytes_encoding = 'hex')
This simple change helped in my case, but I do not know if it is a general case. Can you try this and, eventually push this fix to your code?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions