Skip to content

feat:remove header and footer#53

Open
wangxinbiao wants to merge 1 commit intokubeagi:mainfrom
wangxinbiao:main
Open

feat:remove header and footer#53
wangxinbiao wants to merge 1 commit intokubeagi:mainfrom
wangxinbiao:main

Conversation

@wangxinbiao
Copy link
Collaborator

@wangxinbiao wangxinbiao commented Apr 9, 2024

What type of PR is this?

What this PR does / why we need it

remove header and footer
extract images
extract tables

Which issue(s) this PR fixes

Special notes for your reviewer


[project.optional-dependencies]
dev = ["black==23.3.0", "pylint==3.1.0"]
experiment = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangxinbiao @Lanture1064 Please make sure this is approprivate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when processing PDFs, unstructured is not used and can be deleted.

@nkwangleiGIT
Copy link

Add doc about the details to handle the points below:

  1. remove header and footer
  2. extract images
  3. extract tables

@wangxinbiao
Copy link
Collaborator Author

Related documentation has been added. @nkwangleiGIT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants