Skip to content

perf: pdfminer to fitz in convert#630#685

Open
thinklab wants to merge 7 commits intomainfrom
feat/pdfminer-to-fitz-in-convert#630
Open

perf: pdfminer to fitz in convert#630#685
thinklab wants to merge 7 commits intomainfrom
feat/pdfminer-to-fitz-in-convert#630

Conversation

@thinklab
Copy link
Collaborator

@thinklab thinklab commented Jun 8, 2023

closes #630

@thinklab thinklab requested a review from serereg June 8, 2023 08:44
@thinklab thinklab changed the title Feat/pdfminer to fitz in convert#630 perf: pdfminer to fitz in convert#630 Jun 8, 2023
return new_x0, new_y0, new_x1, new_y1


class PlainPDFToBadgerdocTokensConverterPytz:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it is needed to replase PlainPDFToBadgerdocTokensConverter class by this one

self.offset = 0
self.page_size: Optional[PageSize] = None

def _convert_span(self, span): # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary to remove all type ignores. In order to simplify the process of determining types, you can add a breakpoint here and in debug mode, find out the type of this variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MAJOR] fitz should be much faster than pdfminer

2 participants