diff --git a/README.md b/README.md index 88b7786..ec69f20 100644 --- a/README.md +++ b/README.md @@ -11,8 +11,6 @@ Our approach is very simple (only 293 lines of code), but can almost perfectly p Average cost per page: $0.013 -This package use [GeneralAgent](https://github.com/CosmosShadow/GeneralAgent) lib to interact with OpenAI API. - [pdfgpt-ui](https://github.com/daodao97/gptpdf-ui) is a visual tool based on gptpdf. @@ -74,12 +72,13 @@ see [examples/gptpdf_Quick_Tour.ipynb](examples/gptpdf_Quick_Tour.ipynb) def parse_pdf( pdf_path: str, output_dir: str = './', - prompt: Optional[Dict] = None, - api_key: Optional[str] = None, - base_url: Optional[str] = None, + api_key: str = None, + base_url: str = None, model: str = 'gpt-4o', - verbose: bool = False, - gpt_worker: int = 1 + gpt_worker: int = 1, + prompt: str = DEFAULT_PROMPT, + rect_prompt: str = DEFAULT_RECT_PROMPT, + role_prompt: str = DEFAULT_ROLE_PROMPT ) -> Tuple[str, List[str]]: ``` @@ -93,11 +92,11 @@ Parses a PDF file into a Markdown file and returns the Markdown content along wi - **output_dir**: *str*, default: './' Output directory to store all images and the Markdown file -- **api_key**: *Optional[str]*, optional - OpenAI API key. If not provided, the `OPENAI_API_KEY` environment variable will be used. +- **api_key**: *str* + OpenAI API key. If not provided through this parameter, it must be set via the `OPENAI_API_KEY` environment variable. -- **base_url**: *Optional[str]*, optional - OpenAI base URL. If not provided, the `OPENAI_BASE_URL` environment variable will be used. This can be modified to call other large model services with OpenAI API interfaces, such as `GLM-4V`. +- **base_url**: *str*, optional + OpenAI base URL. If not provided through this parameter, it must be set via the `OPENAI_BASE_URL` environment variable. - **model**: *str*, default: 'gpt-4o' OpenAI API formatted multimodal large model. If you need to use other models, such as: @@ -106,44 +105,32 @@ Parses a PDF file into a Markdown file and returns the Markdown content along wi - [Yi-Vision](https://platform.lingyiwanwu.com/docs) - Azure OpenAI, by setting the `base_url` to `https://xxxx.openai.azure.com/` to use Azure OpenAI, where `api_key` is the Azure API key, and the model is similar to `azure_xxxx`, where `xxxx` is the deployed model name (tested). -- **verbose**: *bool*, default: False - Verbose mode. When enabled, the content parsed by the large model will be displayed in the command line. - - **gpt_worker**: *int*, default: 1 - Number of GPT parsing worker threads. If your machine has better performance, you can increase this value to speed up the parsing. + Number of GPT workers to use for parallel processing. You can set it according to your model and GPU memory. + +- **prompt**: *str*, default: Use built-in prompts + Custom main prompt for guiding model processing of image content. -- **prompt**: *dict*, optional - If the model you are using does not match the default prompt provided in this repository and cannot achieve the best results, we support adding custom prompts. The prompts in the repository are divided into three parts: - - `prompt`: Mainly used to guide the model on how to process and convert text content in images. - - `rect_prompt`: Used to handle cases where specific areas (such as tables or images) are marked in the image. - - `role_prompt`: Defines the role of the model to ensure the model understands it is performing a PDF document parsing task. +- **rect_prompt**: *str*, default: Use built-in prompts + Custom area-specific prompts for handling marked regions (tables/images). + +- **role_prompt**: *str*, default: Use built-in prompts + Custom role definition prompt to ensure model understands its parsing task. You can pass custom prompts in the form of a dictionary to replace any of the prompts. Here is an example: ``` -prompt = { - "prompt": "Custom prompt text", - "rect_prompt": "Custom rect prompt", - "role_prompt": "Custom role prompt" -} - content, image_paths = parse_pdf( pdf_path=pdf_path, output_dir='./output', model="gpt-4o", - prompt=prompt, + prompt="Custom main prompt for guiding model processing of image content", + rect_prompt="Custom area-specific prompts for handling marked regions (tables/images)", + role_prompt="Custom role definition prompt to ensure model understands its parsing task", verbose=False, ) ``` - - -**args**: LLM other parameters, such as `temperature`, `top_p`, `max_tokens`, `presence_penalty`, `frequency_penalty`, etc. - - - - - ## Join Us 👏🏻 Scan the QR code below with WeChat to join our group chat or contribute. diff --git a/README_CN.md b/README_CN.md index 47a380b..25c94c9 100644 --- a/README_CN.md +++ b/README_CN.md @@ -88,7 +88,11 @@ def parse_pdf( - **base_url**:*str*,可选 OpenAI 基本 URL。如果未通过此参数提供,则必须通过 `OPENAI_BASE_URL` 环境变量设置。可用于配置自定义 OpenAI API 端点。 -- **model**:*str*,默认值:'gpt-4o'。OpenAI API 格式的多模态大模型。如果需要使用其他模型,例如 +- **model**:*str*,默认值:'gpt-4o'。OpenAI API 格式的多模态大模型。如果需要使用其他模型,例如: + - [qwen-vl-max](https://help.aliyun.com/zh/dashscope/developer-reference/compatibility-of-openai-with-dashscope) + - [GLM-4V](https://open.bigmodel.cn/dev/api#glm-4v) + - [Yi-Vision](https://platform.lingyiwanwu.com/docs) + - Azure OpenAI,通过设置 `base_url` 为 `https://xxxx.openai.azure.com/` 来使用 Azure OpenAI,其中 `api_key` 是 Azure API 密钥,模型类似为 `azure_xxxx`,其中 `xxxx` 是已部署模型的名称(已测试)。 - **gpt_worker**:*int*,默认值:1 GPT 解析工作线程数。如果您的机器性能较好,可以适当调高,以提高解析速度。