| Model Provider | Models | Open / Paid | Example Code | Doc |
|---|---|---|---|---|
| Anthropic | claude-opus-4-20250514, claude-sonnet-4-20250514, claude-3-7-sonnet-20250219, claude-3-5-sonnet-20241022 |
Paid | Code | Doc |
| Gemini | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite-preview-06-17, gemini-2.0-flash, gemini-2.0-flash-lite, gemini-2.0-pro-exp-02-05 |
Paid | Code | Doc |
| OpenAI | gpt-4.1-2025-04-14, gpt-4.1-mini-2025-04-14, gpt-4o, gpt-4o-mini |
Paid | Code | Doc |
| Mistral-OCR | mistral-ocr |
Paid | Code | Doc |
| OmniAI | omniai |
Paid | Code | Doc |
| Google & Meta | gemma3:4b, gemma3:12b, gemma3:27b, x/llama3.2-vision:11b |
Open Weight | Code | Gemma Doc, Llama3.2 Doc |
| IBM | SmolDocling-256M-preview |
Open Weight | Code | Doc |
# UI
streamlit>=1.43.2
# SmolDocling related
docling_core>=2.23.1
# LLM related Libraries
ollama>=0.4.7
openai>=1.66.3
anthropic>=0.49.0
google-genai>=1.5.0
# Huggingface library
transformers>=4.49.0
# Utilities
python-dotenv>=1.0.1
pillow>=11.1.0
requests>=2.32.3
torch>=2.6.0-
- Python 3.9 or higher
- pip (Python package installer)
-
-
Clone the repository:
git clone https://github.com/genieincodebottle/parsemypdf.git cd parsemypdf -
Create a virtual environment:
python -m venv venv venv\Scripts\activate # On Linux -> source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Rename
.env.exampleto.envand update required Environment Variables as per requirementsANTHROPIC_API_KEY=your_key_here # For Claude OPENAI_API_KEY=your_key_here # For OpenAI GOOGLE_API_KEY=your_key_here # For Google's Gemini models api key MISTRAL_API_KEY=your_key_here # For Mistral API Key OMNI_API_KEY=your_key_here # For Omniai API Key
For ANTHROPIC_API_KEY follow this -> https://console.anthropic.com/settings/keys
For OPENAI_API_KEY follow this -> https://platform.openai.com/api-keys
For GOOGLE_API_KEY follow this -> https://ai.google.dev/gemini-api/docs/api-key
For MISTRAL_API_KEY follow this -> https://console.mistral.ai/api-keys
For OMNI_API_KEY follow this -> https://app.getomni.ai/settings/account
-
Install Ollama & Models (for local processing)
-
Install Ollama
- For Window - Download the Ollama from following location (Requires Window 10 or later) -> https://ollama.com/download/windows
- For Linux (command line) - curl https://ollama.ai/install.sh | sh
-
Pull required Vision Language Models as per your system capcity (command line)
- ollama pull gemma3:4b
- ollama pull gemma3:12b
- ollama pull gemma3:27b
- ollama pull x/llama3.2-vision:11b
-
-
To review each Vision Language Model powered OCR in the Web UI, navigate to
parsemypdf/llm_ocr/<provider_folder>(e.g., claude) and run:streamlit run main.py
-
To review all the Vision Language Models powered OCR at single Web UI, navigate to root folder ->
parsemypdfand run:streamlit run vlm_ocr_app.py
-
