Introductory Course on Transkribus and HTR
This course will explain and demonstrate the Handwritten Text Recognition (HTR) platform Transkribus, a popular tool since its release for making historical documents more readable and accessible.
These two sessions will ensure that you get to grips with the basics of transcribing handwritten documents using this software, and will teach you how to build your own HTR model, whilst avoiding the pitfalls in using automatic transcription.
In the first session we will cover
- the role of HTR and OCR technologies and how they differ as tools
- Uploading documents to Transkribus
- Identifying text zones and images
- Segmentation and baselines
- Working with distorted or broken text
- Abbreviations, contractions and unusual characters
- Transcribing written material manually
In the second session we will
- Train and run an HTR model
- Discuss a suitable Character Error Rate (CER)
- How to improve CER and the model
- Keyword spotting tool
- Other functions of Transkribus
In this repo you will find
A folder containing images that we will use for practice, called 'Dutchess of Atholl'. Please download and save this on your device. Information on this course, and instructions on what to prepare before class in this Readme file.
Before the first class you should
-
Download the folder labelled 'Dutchess of Atholl' and save somewhere convenient on your personal device (you do not need to upload this to Transkribus yet, we will do that in class)
-
Dowload Transkribus Expert Client here:https://readcoop.eu/transkribus/download/ Note: in order to work with Transkribus you need to have java installed. You can download the latest version from the official oracle website here: https://www.oracle.com/java/technologies/downloads/
-
Register for a free Transkribus account - you will need this to log into the software every time you use it
This Repo has been created by Sarah van Eyndhoven and has a 