DocVision is a Streamlit‐based inference UI for document layout analysis. It supports three segmentation methods (XY‑Cut, DocStrum, Hybrid) with LightGBM classification and an Faster R‑CNN detector.
- DocLayNet Core & Extra (COCO format)
- Link: DocLayNet
- We pull sample images for inference only in
Segmentation/test_images/. - Full dataset is large (∼28 GB) and should be downloaded separately if you need to train or evaluate on the full set.
- LightGBM (
.pkl, < 100 MB): included in repo underClassification/. - Faster R‑CNN (
.pth, > 100 MB): hosted on Hugging Face athttps://huggingface.co/pmodi08/DocVision-Models - The code automatically downloads them at runtime using
huggingface_hub.
To get started:
git clone https://github.com/Prahar08modi/DocVision.git
cd DocVision-
Build the Docker image:
docker build -t docvision:latest . -
Run the container (no HF token needed for public repo):
docker run -it --rm \ -p 8501:8501 \ docvision:latest
The streamlit UI will be hosted at http://localhost:8501
-
Persist HF cache between runs (optional):
docker run -it --rm \ -p 8501:8501 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ docvision:latestThe streamlit UI will be hosted at http://localhost:8501
If you prefer not to use Docker, you can run locally:
-
Install dependencies:
pip install -r requirements.txt
Contents of
requirements.txt:opencv-python matplotlib torchvision streamlit joblib scikit-learn tqdm lightgbm huggingface_hub -
Run Streamlit:
streamlit run UI/app_dl.py \ --server.fileWatcherType none \ --server.port 8501
The streamlit UI will be hosted at http://localhost:8501
-
Use
- Upload an image (PNG/JPG). Sample test images can be found under
Segmentation/test_images - Click Next: Classify zones to run LightGBM.
- Expand “🚀 Try Faster R‑CNN” and check Run Faster R‑CNN.
- Upload an image (PNG/JPG). Sample test images can be found under
Below is the flowchart summarizing the LightGBM classification pipeline used in DocVision:
Description: This flowchart illustrates how each scanned document image flows through: binarization, segmentation (XY‑Cut, DocStrum, or Hybrid), feature extraction (text density, edge count, geometric and statistical features), classification via the trained LightGBM model, and finally overlaying the predicted categories on the original image for visualization.
If you want to retrain your classification models:
-
Prepare your own subset of DocLayNet (COCO JSON + images).
- Download DocLayNet_Core from DocLayNet_Core
- Store it under Dataset directory under Project Root folder.
-
Train LightGBM:
cd Train python train_lightgbm.py \ --device cpu/gpu --coco-json path/to/train.json \ --image-dir path/to/images \ --output updated_lightgbm_doclaynet.pkl -
Train Faster R‑CNN (PyTorch):
cd Deep_Learning/FasterRCNNRun the Jupyter Notebook:
fasterRCNN.ipynb -
Upload any newly trained weights to Hugging Face as needed.
