Skip to content

A smart Jupyter Notebook tool that uses Google Gemini AI to convert complex PDFs into clean, unformatted DOCX files. Features intelligent rate limiting for free-tier APIs, interactive UI, and auto-removal of headers/footers.

License

Notifications You must be signed in to change notification settings

NeelPatra/PDF2DocX_ConvertAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 AI-Powered PDF to DOCX Converter

A robust, AI-driven tool that extracts text from PDF documents and converts them into clean, unformatted DOCX files using Google's Gemini AI models.

Open In Colab

🌟 Features

User Interface

  • AI-Powered Extraction: Uses Google's advanced Gemini Flash models (1.5/2.0) for high-accuracy OCR.
  • Smart Cleaning: Automatically removes repetitive headers, footers, and page numbers.
  • Formatting Removal: Flattens complex PDF layouts into a linear Word document.
  • Rate Limit Handling: Manages Free Tier API limits.
  • Privacy Focused: API keys are input securely and never stored.
  • Interactive UI: Uses ipywidgets for configuration.

🚀 Quick Start

✅ Prerequisites

  • Google account for Gemini API Key.
  • Python 3.

📥 Installation

git clone https://github.com/YOUR_USERNAME/PDF-to-DOCX-Gemini.git
cd PDF-to-DOCX-Gemini
pip install -r requirements.txt

👨🏻‍💻 Usage

  1. Open the notebook:
jupyter notebook PDF2DocX_ConvertAI.ipynb
  1. Run initialization cells.
  2. Open configuration form.
  3. Paste API key.
  4. Load models.
  5. Upload PDF.
  6. Run conversion.

🛠️ Tech Stack

  • Core Logic: Python 3
  • AI Model: Google Gemini
  • Document Handling: python-docx
  • Interface: ipywidgets

🤝 Contributing

Pull requests and issues are accepted.

📄 License

MIT License.

About

A smart Jupyter Notebook tool that uses Google Gemini AI to convert complex PDFs into clean, unformatted DOCX files. Features intelligent rate limiting for free-tier APIs, interactive UI, and auto-removal of headers/footers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published