Optical Character Recognition (OCR) with Scala

Course unit : DSIA-5101C

Author : Arthur Lecert

Teacher : Angelo Corsero

Introduction

In this project, I tried to combine image processing and the library Tesseract to solve the OCR problem. Using functional programming was mandatory in this course.

To improve the quality of the prediction, several methods exist. Binarisation is implemented in the code.

Data

IAM Handwriting Database

The IAM Handwriting Database contains forms of handwritten English text which can be used to train and test handwritten text recognizers and to perform writer identification and verification experiments.

Dataset for future implementation

The Street View Text Dataset

The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. (1) Image text often comes from business signage and (2) business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses. More details about the data set can be found in our paper, Word Spotting in the Wild. For our up-to-date benchmarks on this data, see our paper, End-to-end Scene Text Recognition. This dataset only has word-level annotations (no character bounding boxes) and should be used for:

cropped lexicon-driven word recognition and
full image lexicon-driven word detection and recognition.

List of Libraries used in the project

Dependencies

Or you can use IntelliJ IDEA with Scala and sbt plugins.

Compile & Run

From the root directory of the project, you can compile the project:

sbt compile

Run the project:

sbt run

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.idea		.idea
project		project
src/main		src/main
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optical Character Recognition (OCR) with Scala

Introduction

Data

IAM Handwriting Database

Dataset for future implementation

The Street View Text Dataset

List of Libraries used in the project

Dependencies

Compile & Run

About

Uh oh!

Releases

Packages

Languages

Arthlec/OCR-Scala

Folders and files

Latest commit

History

Repository files navigation

Optical Character Recognition (OCR) with Scala

Introduction

Data

IAM Handwriting Database

Dataset for future implementation

The Street View Text Dataset

List of Libraries used in the project

Dependencies

Compile & Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages