Skip to content

added updates to fix issue #358#365

Merged
MaksymTkachuk merged 1 commit intomasterfrom
10feb2026-maksym-issue-extract-tables
Feb 12, 2026
Merged

added updates to fix issue #358#365
MaksymTkachuk merged 1 commit intomasterfrom
10feb2026-maksym-issue-extract-tables

Conversation

@MaksymTkachuk
Copy link
Collaborator

@MaksymTkachuk MaksymTkachuk commented Feb 10, 2026

#358
While testing with the document from issue #358 using this code, I encountered an error. I’ve added some fixes to address it.

cv = Converter('err_table.pdf')
tables = cv.extract_tables()
cv.close()

@jamie-lemon
Copy link
Collaborator

@MaksymTkachuk Looks like there is a failing test in here - @julian-smith-artifex-com might have some ideas

@MaksymTkachuk
Copy link
Collaborator Author

@jamie-lemon
I checked the log.
https://github.com/ArtifexSoftware/pdf2docx/actions/runs/21879142339/job/63156371745?pr=365
I updated the version of PyMuPDF to the latest 1.26.7, but it`s not found in the available version list.
I will set it to 1.24.11.
Thanks.

@julian-smith-artifex-com
Copy link
Collaborator

julian-smith-artifex-com commented Feb 10, 2026

  1. I think we should keep the dependency on PyMuPDF>=1.26.7. I don't know why it failed, but it works fine for me locally, so maybe it was a Github or pypi glitch.

  2. How come test_extract_tables_issue_358_err_table() was added and then removed? It seems odd given that this PR claims to fix issue table convert bug #358?

  3. We probably need to change setup.py to say `python_requires=">=3.10" as 3.10 is the oldest supported python at the moment.

  4. I've added support for pdf2docx to aptest. See that page for documentation.

  5. I tested this PR's original code with:

    ./aptest/aptest.py --pdf2docx 'git:--sha f23caa037f4068664ecf180e250e8ef9f1b6f754' build test

    And test_extract_tables_issue_358_err_table() failed with:

    pymupdf.FileNotFoundError: no such file: '/home/jules/artifex/aptest-git-pdf2docx/test/samples/issue-358-err_table.pdf'

    This looks like the input PDF file needs to be added to git?

Copy link
Collaborator

@julian-smith-artifex-com julian-smith-artifex-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please see my comments on the main page.

setup.py Outdated
zip_safe=False,
install_requires=load_requirements("requirements.txt"),
python_requires=">=3.6",
python_requires=">=3.14",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? Current supported python versions are 3.10-3.14 so i would have expected >=3.10 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I am using 3.11. Is it ok?

Copy link
Collaborator Author

@MaksymTkachuk MaksymTkachuk Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I updated it from 3.11 to 3.10.

@MaksymTkachuk
Copy link
Collaborator Author

MaksymTkachuk commented Feb 11, 2026

Hi @jamie-lemon @julian-smith-artifex-com
I updated setup.py and test.yml files to pass the checks on github.

  1. I upgraded PyMuPDF version to latest in requirements.txt
  2. Updated workflow v2->v4.
  3. Use Python 3.10 (Docker and matrix) so PyMuPDF 1.26.7 can be installed.
  4. Install setuptools and wheel before python setup.py develop.
  5. Use unique artifact names per Python version: outputs-py3.10, outputs-py3.11, outputs-py3.12 to avoid conflicts between tests.
    As you can see, it passes all tests succesfully.

Copy link
Collaborator

@julian-smith-artifex-com julian-smith-artifex-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about points 2 and 5 in my earlier message? (About the added+removed test_extract_tables_issue_358_err_table() test, and the seemingly missing input file test/samples/issue-358-err_table.pdf.)

Comment on lines 56 to 57
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.10", "3.11", "3.12"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably test on all supported python versions, currently 3.10-3.14.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added python versions.
["3.10", "3.11", "3.12"] -> ["3.10", "3.11", "3.12", "3.13", "3.14"]

@MaksymTkachuk
Copy link
Collaborator Author

What do you think about points 2 and 5 in my earlier message? (About the added+removed test_extract_tables_issue_358_err_table() test, and the seemingly missing input file test/samples/issue-358-err_table.pdf.)

I added test_extract_tables_issue_358_err_table() into unit-test. it's working now.

@MaksymTkachuk MaksymTkachuk force-pushed the 10feb2026-maksym-issue-extract-tables branch 4 times, most recently from e831e40 to d570be7 Compare February 12, 2026 16:55
- Update GitHub Actions workflows to use Python 3.10+ (required for PyMuPDF 1.26.7)
- Add setuptools/wheel installation in test workflow
- Fix artifact naming conflict by including Python version in artifact names
- Update PyPI publish action to use stable release/v1 tag
- Fixed bug when extracting tables with empty cells, see test_extract_tables_empty_cell().
- Update requirements.txt to use PyMuPDF>=1.26.7
- Update setup.py python_requires to >=3.10
@MaksymTkachuk MaksymTkachuk force-pushed the 10feb2026-maksym-issue-extract-tables branch from d570be7 to 33a6ded Compare February 12, 2026 17:03
@MaksymTkachuk MaksymTkachuk merged commit a03f251 into master Feb 12, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants