
A large-scale Quran audio dataset featuring ... reciters and word-level alignment — built for Arabic speech research.
Tadabur covers the entire Qur’an with automatic word-level timestamp alignment and structured JSON metadata for each file. It is the largest and most diverse publicly available Qur’anic speech dataset.
| 🕐 Audio | 1400+ hours |
| 🎙️ Reciters | 600+ |
| 🔗 Alignment | Word-level timestamps |
Each example contains an audio file and a JSON metadata file:
{
"reciter_id": 88,
"surah_id": 3,
"ayah_id": 82,
"word_alignments": [
{
"word": "أفلا",
"start": 0.089,
"end": 0.87
},
...
],
"text_ar_simple": "افلا يتدبرون القران ولو كان من عند غير الله لوجدوا فيه اختلافا كثيرا",
"text_ar_uthmani": "أَفَلَا يَتَدَبَّرُونَ ٱلْقُرْءَانَ ۚ وَلَوْ كَانَ مِنْ عِندِ غَيْرِ ٱللَّهِ لَوَجَدُوا۟ فِيهِ ٱخْتِلَـٰفًا كَثِيرًا",
"ayah_duration_s": 10.9,
"audio_filename": "tadabur_spk0088_S3_A82_db1f8e71_000003.wav"
}Whisper models fine-tuned on Tadabur for Qur'anic ASR, available on HuggingFace:
| Model | Base | Status | Link |
|---|---|---|---|
| Tadabur-Whisper-Small | Whisper Small | ✅ Available | 🤗 HuggingFace |
@misc{alherran2026tadabur,
author = {Alherran, Faisal},
title = {Tadabur: A Large-Scale Quran Audio Dataset},
year = {2026},
url = {https://github.com/fherran/tadabur}
}For questions or collaboration, feel free to reach out on LinkedIn.