Qwen2-VL Document Classification Pipeline

Welcome to the Qwen2-VL Document Classification Pipeline project! This repository showcases a powerful, streamlined pipeline for classifying various document types using the Qwen2-VL-2B-GPTQ-INT4 model.

Project Overview

This pipeline leverages cutting-edge AI technology to classify documents into predefined categories. It supports image and PDF input formats, processes documents with efficient natural language understanding, and outputs precise classification reports and confusion matrices.

Key Features:

Multi-page PDF handling with vertical merging of pages for seamless processing.
Optimized prompt engineering for domain-specific accuracy.
Automatic evaluation with detailed reports and visualizations.
Simple execution in Google Colab — no additional setup required!

Requirements

Hardware Requirements:

Minimum CPU RAM: 5 GB
Minimum GPU RAM: 8 GB
Additional memory for file storage.

Setup

No complex installations or dependencies! Open the notebook in Google Colab, upload your files, and run the cells sequentially.

Usage in Google Colab

Open the Notebook Open the project notebook in Google Colab using this link.
Upload Your Files Place your documents in the appropriate folders and mount your Google Drive.
Run the Notebook Execute the cells sequentially to:
- Load the model and processor.
- Convert PDFs to images.
- Classify documents and generate reports.
Download Results Results (Excel file, confusion matrix) are saved in the outputs directory for easy access.

Folder Structure

Ensure your file structure follows this format for proper pipeline execution:

root_directory/
    ├── azure_files/
    │   ├── bill_of_lading/
    │   ├── customs_document/
    │   ├── delivery_receipt/
    │   ├── invoice/
    │   ├── ... (other categories)
    └── outputs/
        ├── classification_results.xlsx
        ├── confusion_matrix.png

Pipeline Steps

Load Model and Processor The pipeline utilizes the Qwen2-VL-2B-GPTQ-INT4 model for document classification.
PDF Conversion and Image Merging Multi-page PDFs are converted to vertically stacked images to ensure seamless input to the model.
Prompt Engineering Employs domain-specific patterns and keywords for improved classification accuracy.
Evaluation
- Generates detailed confusion matrices and classification reports.
- Produces color-coded Excel files for results.
Visualization Heatmaps and graphical representations provide insights into model performance.

Results and Reporting

Outputs Include:

Classification Accuracy: Per-category and overall performance.
Confusion Matrix: Heatmap of expected vs predicted classifications.
Excel Reports: Color-coded Excel files summarizing results.

Future Enhancements

Integration with OCR for enhanced text extraction.
Support for multilingual document classification.
Fine-tuning with additional labeled datasets.
Adoption of advanced models like LayoutLMv3 for complex layouts.

Contributing

We welcome contributions from the community! To contribute:

Fork the repository.
Create a feature branch.
Submit a pull request with a detailed explanation of your changes.

Start Classifying with Ease! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Qwen2.ipynb		Qwen2.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen2-VL Document Classification Pipeline

Project Overview

Table of Contents

Requirements

Setup

Usage in Google Colab

Folder Structure

Pipeline Steps

Results and Reporting

Future Enhancements

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Qwen2-VL Document Classification Pipeline

Project Overview

Table of Contents

Requirements

Setup

Usage in Google Colab

Folder Structure

Pipeline Steps

Results and Reporting

Future Enhancements

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages