DocVision

1. Project Overview

DocVision is a Streamlit‐based inference UI for document layout analysis. It supports three segmentation methods (XY‑Cut, DocStrum, Hybrid) with LightGBM classification and an Faster R‑CNN detector.

Project Report

2. Dataset

DocLayNet Core & Extra (COCO format)
Link: DocLayNet
We pull sample images for inference only in Segmentation/test_images/.
Full dataset is large (∼28 GB) and should be downloaded separately if you need to train or evaluate on the full set.

3. Saved Models

LightGBM (.pkl, < 100 MB): included in repo under Classification/.
Faster R‑CNN (.pth, > 100 MB): hosted on Hugging Face at https://huggingface.co/pmodi08/DocVision-Models
The code automatically downloads them at runtime using huggingface_hub.

4. Setup & Clone Repo

To get started:

git clone https://github.com/Prahar08modi/DocVision.git
cd DocVision

5. Docker Container (Preferred)

Build the Docker image:
```
docker build -t docvision:latest .
```
Run the container (no HF token needed for public repo):
```
docker run -it --rm \
  -p 8501:8501 \
  docvision:latest
```
The streamlit UI will be hosted at http://localhost:8501

Persist HF cache between runs (optional):

docker run -it --rm \
  -p 8501:8501 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  docvision:latest

The streamlit UI will be hosted at http://localhost:8501

6. Inference UI (Local Alternative)

If you prefer not to use Docker, you can run locally:

Install dependencies:

pip install -r requirements.txt

Contents of requirements.txt:

opencv-python
matplotlib
torchvision
streamlit
joblib
scikit-learn
tqdm
lightgbm
huggingface_hub

Run Streamlit:

streamlit run UI/app_dl.py \
  --server.fileWatcherType none \
  --server.port 8501

The streamlit UI will be hosted at http://localhost:8501

Use
- Upload an image (PNG/JPG). Sample test images can be found under Segmentation/test_images
- Click Next: Classify zones to run LightGBM.
- Expand “🚀 Try Faster R‑CNN” and check Run Faster R‑CNN.

LightGBM Classification Flowchart

Below is the flowchart summarizing the LightGBM classification pipeline used in DocVision:

Description: This flowchart illustrates how each scanned document image flows through: binarization, segmentation (XY‑Cut, DocStrum, or Hybrid), feature extraction (text density, edge count, geometric and statistical features), classification via the trained LightGBM model, and finally overlaying the predicted categories on the original image for visualization.

(Optional) Training the Models

If you want to retrain your classification models:

Prepare your own subset of DocLayNet (COCO JSON + images).
- Download DocLayNet_Core from DocLayNet_Core
- Store it under Dataset directory under Project Root folder.

Train LightGBM:

cd Train
python train_lightgbm.py \
  --device cpu/gpu
  --coco-json path/to/train.json \
  --image-dir path/to/images \
  --output updated_lightgbm_doclaynet.pkl

Train Faster R‑CNN (PyTorch):
```
cd Deep_Learning/FasterRCNN
```
Run the Jupyter Notebook: fasterRCNN.ipynb
Upload any newly trained weights to Hugging Face as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Assets		Assets
Classification		Classification
Deep_Learning		Deep_Learning
Feature_Extraction		Feature_Extraction
Segmentation		Segmentation
Train		Train
UI		UI
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Group8_Project_Report.pdf		Group8_Project_Report.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocVision

1. Project Overview

2. Dataset

3. Saved Models

4. Setup & Clone Repo

5. Docker Container (Preferred)

6. Inference UI (Local Alternative)

LightGBM Classification Flowchart

(Optional) Training the Models

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocVision

1. Project Overview

2. Dataset

3. Saved Models

4. Setup & Clone Repo

5. Docker Container (Preferred)

6. Inference UI (Local Alternative)

LightGBM Classification Flowchart

(Optional) Training the Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages