ONGC PDF Image Extractor & Classifier

This project aims to automate the extraction of images from well completion report PDFs and classify them into predefined categories using machine learning techniques.

📌 Table of Contents

Project Overview
Features
Usage
Demo
Report
Future Improvements
Acknowledgement
License
Contact

📌 Project Overview

This project automates the Extraction and Classification of Images from Well Construction Report PDFs using Machine Learning and Natural Language Processing (NLP) techniques.

The System:

Detects figure with caption, figure without caption & graphs in PDFs using a YOLOv8 model.
Extracts captions associated with detected images.
Classifies captions using Logistic Regression-based NLP in classes:
- Contour_Maps
- Drilling_Plots
- Geological_Map
- Geotechnical_Order
- Location_Map
- Log_Motif
- Remote_Sensing_Image
- Seismic_Section
- Stratigraphy_and_Casing_Plot
- Structural_Map
- Well_Construction_Diagram
- Well_Schematic_Diagram
- Others
Organizes the output into structured directories.
Provides a GUI-based interaction using PyQt6.

📌 Features

✅ Object Detection: Uses YOLOv8 to detect figures (labeled/unlabeled) and graphs.
✅ Caption Extraction: Extracts captions near detected images using PyMuPDF.
✅ NLP-based Classification: Classifies captions using TF-IDF + Logistic Regression.
✅ Automated Processing: Processes multiple PDFs at once.
✅ User-Friendly GUI: A PyQt6 interface for browsing PDFs and viewing results.
✅ Structured Output: Saves extracted images and captions in organized folders.

🚀 Usage

Prerequisites

Python 3.6+

Clone the Repository

git clone https://github.com/mhsuhail00/ONGC-PDF-Image-Classification.git

```
cd ONGC-PDF-Image-Classification
```

Install Required Dependencies

pip install -r requirements.txt

Run the Application

python main.py

Output Directory

captured_images
   └───PDF_file_name
         ├───figure_without_label
         │             ├───page_1_object_2.png
         │             └───page_2_object_6.png
         ├───figure_with_label
         │             ├───Contour_Maps
         │                      ├───page_1_object_1.png
         │                      └───page_1_object_1.txt
         │             ├───Drilling_Plots
         │             ├───Geological_Map
         │             ├───Geotechnical_Order
         │             ├───Location_Map
         │             ├───Log_Motif
         │             ├───Others
         │             ├───Remote_Sensing_Image
         │             ├───Seismic_Section
         │             ├───Stratigraphy_and_Casing_Plot
         │             ├───Structural_Map
         │             ├───Well_Construction_Diagram
         │             └───Well_Schematic_Diagram
         └───graph
              ├───page_1_object_3.png
              └───page_1_object_4.png

📸 GIF Demonstration

📌 Future Improvements

Enhancing Caption Extraction – Improving accuracy for multi-line captions.
Deep Learning-Based Classification – Exploring transformer-based NLP models.
Extending Image Classification – Using CNNs for better image categorization.
GUI Enhancements – Adding real-time progress tracking.

🎯 Acknowledgments

This project was developed as part of an Industrial Training at ONGC GEOPIC Centre, Dehradun, under the guidance of Mr. Sanjay Chakravorty, Dy. General Manager (Programming), ONGC.

Author: Mohammad Suhail
Institution: Zakir Husain College of Engineering & Technology, Aligarh Muslim University

📄 Report

You can view the detailed report of this project here:
Project Report

📜 License

This project is licensed under the Apache License.

📩 Contact

Developer: Mohammad Suhail
Email: mhsuhail00@gmail.com
GitHub Profile: mhsuhail00

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Demonstration		Demonstration
Example_PDF		Example_PDF
Models		Models
Resources		Resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ONGC PDF Image Extractor & Classifier

📌 Table of Contents

📌 Project Overview

📌 Features

🚀 Usage

Prerequisites

Clone the Repository

Install Required Dependencies

Run the Application

Output Directory

📸 GIF Demonstration

📌 Future Improvements

🎯 Acknowledgments

📄 Report

📜 License

📩 Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

mhsuhail00/ONGC-PDF-Image-Classification

Folders and files

Latest commit

History

Repository files navigation

ONGC PDF Image Extractor & Classifier

📌 Table of Contents

📌 Project Overview

📌 Features

🚀 Usage

Prerequisites

Clone the Repository

Install Required Dependencies

Run the Application

Output Directory

📸 GIF Demonstration

📌 Future Improvements

🎯 Acknowledgments

📄 Report

📜 License

📩 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages