Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 22 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ Our approach is very simple (only 293 lines of code), but can almost perfectly p

Average cost per page: $0.013

This package use [GeneralAgent](https://github.com/CosmosShadow/GeneralAgent) lib to interact with OpenAI API.

[pdfgpt-ui](https://github.com/daodao97/gptpdf-ui) is a visual tool based on gptpdf.


Expand Down Expand Up @@ -74,12 +72,13 @@ see [examples/gptpdf_Quick_Tour.ipynb](examples/gptpdf_Quick_Tour.ipynb)
def parse_pdf(
pdf_path: str,
output_dir: str = './',
prompt: Optional[Dict] = None,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
api_key: str = None,
base_url: str = None,
model: str = 'gpt-4o',
verbose: bool = False,
gpt_worker: int = 1
gpt_worker: int = 1,
prompt: str = DEFAULT_PROMPT,
rect_prompt: str = DEFAULT_RECT_PROMPT,
role_prompt: str = DEFAULT_ROLE_PROMPT
) -> Tuple[str, List[str]]:
```

Expand All @@ -93,11 +92,11 @@ Parses a PDF file into a Markdown file and returns the Markdown content along wi
- **output_dir**: *str*, default: './'
Output directory to store all images and the Markdown file

- **api_key**: *Optional[str]*, optional
OpenAI API key. If not provided, the `OPENAI_API_KEY` environment variable will be used.
- **api_key**: *str*
OpenAI API key. If not provided through this parameter, it must be set via the `OPENAI_API_KEY` environment variable.

- **base_url**: *Optional[str]*, optional
OpenAI base URL. If not provided, the `OPENAI_BASE_URL` environment variable will be used. This can be modified to call other large model services with OpenAI API interfaces, such as `GLM-4V`.
- **base_url**: *str*, optional
OpenAI base URL. If not provided through this parameter, it must be set via the `OPENAI_BASE_URL` environment variable.

- **model**: *str*, default: 'gpt-4o'
OpenAI API formatted multimodal large model. If you need to use other models, such as:
Expand All @@ -106,44 +105,32 @@ Parses a PDF file into a Markdown file and returns the Markdown content along wi
- [Yi-Vision](https://platform.lingyiwanwu.com/docs)
- Azure OpenAI, by setting the `base_url` to `https://xxxx.openai.azure.com/` to use Azure OpenAI, where `api_key` is the Azure API key, and the model is similar to `azure_xxxx`, where `xxxx` is the deployed model name (tested).

- **verbose**: *bool*, default: False
Verbose mode. When enabled, the content parsed by the large model will be displayed in the command line.

- **gpt_worker**: *int*, default: 1
Number of GPT parsing worker threads. If your machine has better performance, you can increase this value to speed up the parsing.
Number of GPT workers to use for parallel processing. You can set it according to your model and GPU memory.

- **prompt**: *str*, default: Use built-in prompts
Custom main prompt for guiding model processing of image content.

- **prompt**: *dict*, optional
If the model you are using does not match the default prompt provided in this repository and cannot achieve the best results, we support adding custom prompts. The prompts in the repository are divided into three parts:
- `prompt`: Mainly used to guide the model on how to process and convert text content in images.
- `rect_prompt`: Used to handle cases where specific areas (such as tables or images) are marked in the image.
- `role_prompt`: Defines the role of the model to ensure the model understands it is performing a PDF document parsing task.
- **rect_prompt**: *str*, default: Use built-in prompts
Custom area-specific prompts for handling marked regions (tables/images).

- **role_prompt**: *str*, default: Use built-in prompts
Custom role definition prompt to ensure model understands its parsing task.

You can pass custom prompts in the form of a dictionary to replace any of the prompts. Here is an example:

```
prompt = {
"prompt": "Custom prompt text",
"rect_prompt": "Custom rect prompt",
"role_prompt": "Custom role prompt"
}

content, image_paths = parse_pdf(
pdf_path=pdf_path,
output_dir='./output',
model="gpt-4o",
prompt=prompt,
prompt="Custom main prompt for guiding model processing of image content",
rect_prompt="Custom area-specific prompts for handling marked regions (tables/images)",
role_prompt="Custom role definition prompt to ensure model understands its parsing task",
verbose=False,
)
```



**args**: LLM other parameters, such as `temperature`, `top_p`, `max_tokens`, `presence_penalty`, `frequency_penalty`, etc.





## Join Us 👏🏻

Scan the QR code below with WeChat to join our group chat or contribute.
Expand Down
6 changes: 5 additions & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,11 @@ def parse_pdf(
- **base_url**:*str*,可选
OpenAI 基本 URL。如果未通过此参数提供,则必须通过 `OPENAI_BASE_URL` 环境变量设置。可用于配置自定义 OpenAI API 端点。

- **model**:*str*,默认值:'gpt-4o'。OpenAI API 格式的多模态大模型。如果需要使用其他模型,例如
- **model**:*str*,默认值:'gpt-4o'。OpenAI API 格式的多模态大模型。如果需要使用其他模型,例如:
- [qwen-vl-max](https://help.aliyun.com/zh/dashscope/developer-reference/compatibility-of-openai-with-dashscope)
- [GLM-4V](https://open.bigmodel.cn/dev/api#glm-4v)
- [Yi-Vision](https://platform.lingyiwanwu.com/docs)
- Azure OpenAI,通过设置 `base_url` 为 `https://xxxx.openai.azure.com/` 来使用 Azure OpenAI,其中 `api_key` 是 Azure API 密钥,模型类似为 `azure_xxxx`,其中 `xxxx` 是已部署模型的名称(已测试)。

- **gpt_worker**:*int*,默认值:1
GPT 解析工作线程数。如果您的机器性能较好,可以适当调高,以提高解析速度。
Expand Down