Optical Music Recognition

This repository is a PyTorch implementation of several optical music recognition techniques. The goal is to take an image of a music score as input and produce a MIDI file as output. Currently, the main implemented architecture is a CRNN. A Transformer architecture will follow soon.

Preview

Setup

conda env create -f environment.yaml
conda activate music_recognition

Data Download and Preprocessing

In scripts/download.py, we provide a script to download all relevant datasets for Optical Music Recognition. The script handles two types of datasets:

Camera Primus Dataset: Real sheet music images with semantic labels (for training)
SMT HuggingFace Datasets: Bekern sequences (for synthetic data generation)

Quick Start

python scripts/download.py --primus

python scripts/download.py --smt-all

Available Command Line Arguments

Argument	Description	Source
`--primus`	Download Camera Primus dataset	https://grfia.dlsi.ua.es/primus/
`--smt <dataset>`	Download specific SMT dataset(s)	HuggingFace (antoniorv6/*)
`--smt-all`	Download all SMT datasets	HuggingFace (antoniorv6/*)
`--list-smt`	List available SMT datasets	-
`--output_dir`	Base output directory	Default: `data/datasets`
`--splits`	Dataset splits to download	Default: train, validation, test

Available SMT Datasets:

grandstaff: GrandStaff system-level (original format)
grandstaff-ekern: GrandStaff in ekern format
grandstaff-bekern: GrandStaff in bekern format
mozarteum: Mozarteum dataset
polish-scores: Polish Scores dataset
string-quartets: String Quartets dataset

The data will be stored in the /data/datasets/ folder organized as:

/data/datasets/primus/: Camera Primus dataset (images + semantic labels)
/data/datasets/smt_datasets/: SMT bekern datasets (for synthetic generation)

Repository Structure

├── data/
│   ├── datasets/                    # Downloaded datasets
│   │   ├── primus/                  # Camera Primus dataset (images + semantic labels)
│   │   ├── smt_datasets/            # SMT bekern datasets (for synthetic generation)
│   │   └── synthetic/               # Generated synthetic data
│   └── utils/                       # Data processing utilities
│       ├── format_converter.py      # Primus to bekern format conversion
│       └── synthetic_generator.py   # Synthetic image generation using Verovio
├── networks/                        # Neural networks for OMR tasks
├── scripts/                         # Utility scripts
│   ├── download.py                  # Dataset download script
│   └── generate_synthetic_data.py   # Synthetic data generation script
└── ...

Implemented Networks

CRNN

The first implemented neural network is a CRNN that I reimplemented from the Camera Primus Paper. It uses the a Convolutional Recurrent Neural Network (CRNN) architecture which is characterized by a set of convolutional layers followed by several BiLSTMs and linear layers. Before each activation, batch normalization is performed to make sure that gradients are in an active regime.

The implementation of this stage is almost completed. However, the training in the original paper had 64,000 epochs which is infeasible in terms of available compute power at this point.

TrOMR

The second architecture is a transformer architecture reimplemented from the TrOMR Paper. It uses Transfer Learning with a pretrained Vision Transformer to predict sequences of music symbols.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.claude		.claude
configs		configs
data		data
demos		demos
networks		networks
scripts		scripts
utils		utils
.gitignore		.gitignore
ANALYSIS_SUMMARY.md		ANALYSIS_SUMMARY.md
CLAUDE.md		CLAUDE.md
README.md		README.md
analyze_sequence_truncation.py		analyze_sequence_truncation.py
data_loader.ipynb		data_loader.ipynb
debug_bekern.py		debug_bekern.py
debug_tokenization_mismatch.py		debug_tokenization_mismatch.py
environment.yaml		environment.yaml
evaluate.py		evaluate.py
test_bekern_loading.py		test_bekern_loading.py
train.py		train.py
train.sbatch		train.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Optical Music Recognition

Preview

Setup

Data Download and Preprocessing

Quick Start

Repository Structure

Implemented Networks

CRNN

TrOMR

About

Uh oh!

Releases

Packages

Languages

Luca-Wiehe/music_recognition

Folders and files

Latest commit

History

Repository files navigation

Optical Music Recognition

Preview

Setup

Data Download and Preprocessing

Quick Start

Repository Structure

Implemented Networks

CRNN

TrOMR

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages