Handwriting CRNN

A deep learning project for recognizing handwritten text using a CRNN (Convolutional Recurrent Neural Network).

The goal is to build an end-to-end system that converts handwritten text images into machine-readable text. The final system will be capable of processing entire handwritten pages, detecting individual text lines, and recognizing them.

Project Status

🚧 Early development stage

Current progress:

Environment setup complete
CUDA-enabled PyTorch configured
IAM dataset parsing and preprocessing complete
dataset.csv generated with image paths and labels

Requirements

Python 3.12
PyTorch
CUDA (optional but recommended for training)
NVIDIA GPU (tested on RTX 4050)

Dataset

This project uses the IAM Handwriting Database.

U. Marti and H. Bunke "The IAM-database: an English sentence database for offline handwriting recognition" International Journal on Document Analysis and Recognition, 2002.

Dataset: http://www.fki.inf.unibe.ch/databases/iam-handwriting-database

The dataset is not included in this repository due to licensing restrictions.

Note: The IAM Handwriting Database is restricted to non-commercial research use. This repository only contains code and does not redistribute the dataset.

Expected Dataset Structure

data/
 └── raw/
      ├── words/
      └── words.txt

Data Pipeline

The IAM words dataset is processed into a structured CSV format for training.

Steps:

Parse words.txt
Filter valid samples (ok)
Reconstruct image file paths
Validate image existence
Generate dataset.csv

Output Format

image_path,label
data/raw/words/a01/a01-000u/a01-000u-00-00.png,A

This CSV serves as the input for the PyTorch Dataset class.

Setup

Create a virtual environment:

py -3.12 -m venv venv

Activate it:

venv\Scripts\Activate.ps1

Upgrade pip:

python -m pip install --upgrade pip

Install dependencies:

pip install -r requirements.txt

Quick Start

Verify that PyTorch and CUDA are correctly configured:

python -c "import torch; print(torch.cuda.is_available())"

Expected output:

True

Outputs

Processed dataset:

data/processed/dataset.csv

This file contains image paths and corresponding labels used for training.

Planned Architecture

The final OCR pipeline will follow this structure:

Page Image
   ↓
Line Segmentation
   ↓
CRNN Recognition
   ↓
Text Output

Recognition Model

CNN → BiLSTM → CTC Loss

CNN extracts visual features from the image
BiLSTM models sequential dependencies
CTC Loss enables alignment-free sequence prediction

Project Structure

handwriting-crnn/
│
├── data/                # Dataset directory (not included in repo)
│   ├── raw/             # Original IAM dataset (words/, words.txt)
│   └── processed/       # Generated dataset (dataset.csv)
│
├── src/                 
│   └── dataset/         # Dataset handling
│       └── iam_dataset.py
│
├── prepare_dataset.py   # Script to generate dataset.csv from words.txt
├── test_dataset.py      # Temporary script to verify dataset loading
│
├── requirements.txt
├── requirements-min.txt
├── LICENSE
└── README.md

Roadmap

Phase 0 — Environment Setup ✔
Phase 1 — Dataset Preparation ✔
Phase 2 — CRNN Model Implementation
Phase 3 — Training Pipeline
Phase 4 — Line Segmentation
Phase 5 — Full Page OCR Pipeline

Future Work

Implement CRNN architecture
Build PyTorch Dataset and DataLoader
Train model using CTC loss
Add decoding (greedy / beam search)
Develop line segmentation pipeline
Create demo interface for handwritten text recognition

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
prepare_dataset.py		prepare_dataset.py
requirements.txt		requirements.txt
test_cuda.py		test_cuda.py
test_dataset.py		test_dataset.py
test_model.py		test_model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Handwriting CRNN

Project Status

Requirements

Dataset

Expected Dataset Structure

Data Pipeline

Output Format

Setup

Quick Start

Outputs

Planned Architecture

Recognition Model

Project Structure

Roadmap

Future Work

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Handwriting CRNN

Project Status

Requirements

Dataset

Expected Dataset Structure

Data Pipeline

Output Format

Setup

Quick Start

Outputs

Planned Architecture

Recognition Model

Project Structure

Roadmap

Future Work

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages