Skip to content

the installation failed - wrong versions of numpy, missing fitz package. #214

@PeterBocan

Description

@PeterBocan

Hello all I am struggling with the installation of the dependencies on Apple M1.

I followed the instructions you outlined in the documentation, but running the following script fails:

python project/pdf2markdown/scripts/run_project.py --config project/pdf2markdown/configs/pdf2markdown.yaml

this is this the error:

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.5 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/Users/pbocan/Projects/mort/PDF-Extract-Kit/project/pdf2markdown/scripts/run_project.py", line 5, in <module>
    from pdf2markdown import PDF2MARKDOWN
  File "/Users/pbocan/Projects/mort/PDF-Extract-Kit/project/pdf2markdown/scripts/pdf2markdown.py", line 6, in <module>
    import torch
  File "/Users/pbocan/miniconda/envs/pdf-extract-kit-1.0/lib/python3.10/site-packages/torch/__init__.py", line 1477, in <module>
    from .functional import *  # noqa: F403
  File "/Users/pbocan/miniconda/envs/pdf-extract-kit-1.0/lib/python3.10/site-packages/torch/functional.py", line 9, in <module>
    import torch.nn.functional as F
  File "/Users/pbocan/miniconda/envs/pdf-extract-kit-1.0/lib/python3.10/site-packages/torch/nn/__init__.py", line 1, in <module>
    from .modules import *  # noqa: F403
  File "/Users/pbocan/miniconda/envs/pdf-extract-kit-1.0/lib/python3.10/site-packages/torch/nn/modules/__init__.py", line 35, in <module>
    from .transformer import TransformerEncoder, TransformerDecoder, \
  File "/Users/pbocan/miniconda/envs/pdf-extract-kit-1.0/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 20, in <module>
    device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
/Users/pbocan/miniconda/envs/pdf-extract-kit-1.0/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
  device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
Traceback (most recent call last):
  File "/Users/pbocan/Projects/mort/PDF-Extract-Kit/project/pdf2markdown/scripts/run_project.py", line 5, in <module>
    from pdf2markdown import PDF2MARKDOWN
  File "/Users/pbocan/Projects/mort/PDF-Extract-Kit/project/pdf2markdown/scripts/pdf2markdown.py", line 12, in <module>
    from pdf_extract_kit.utils.data_preprocess import load_pdf
  File "/Users/pbocan/Projects/mort/PDF-Extract-Kit/project/pdf2markdown/scripts/../../../pdf_extract_kit/utils/data_preprocess.py", line 1, in <module>
    import fitz
ModuleNotFoundError: No module named 'fitz'

could you please update the documentation/dependencies such that the installation works? Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions