BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Official Implementation for IEEE BigData 2024 Submission

Abstract

This project focuses on recognizing Bangladeshi dialects and converting diverse Bengali accents into standardized formal Bengali speech. No notable works have been done accurately in Bangla dialects due to a shortage of diverse and large datasets and relying on traditional approaches. Our approach developed a large dataset of dialectal speech signals for fine-tuning LLMs for two tasks, one is dialect speech recognition, and the other one is translating the dialect text to standard Bangla text. Our whisper fine-tuned model achieved a CER of 0.8% and a WER of 1.5%. In the case of translation, the BanglaT5 model attained a BLEU score of 41.6% for dialect-to-standard text translation. Finally, by utilizing AlignTTS, we completed our end-to-end pipeline for dialect standardization.

Here is the available Noakhali Dialect Dataset(NDD)

Dataset Statistics

--

Methodology

1.Data Preprocessing: Converting audio input signal into wav form then it undergoes the process of noise reduction and splitting into manageable 5-second speech segments; similarly, dialect text and standard text are segmented into corresponding chunks.

2.Fine-Tuning: Dialect speeches and dialect texts are used to fine-tune LLM for transcript speech and other LLM has been finetuned for MT from dialect text to standard Bangla text. Finally, AlignTTS has been used for generating standard Bangla speech signal from translated standard Bangla text.

3.Evaluation: Cer, wer, bleu scores are used to measure models performances.

! Fig. 1: (a) Data preprossing [1]; (b) Fine tuning LLMs [2]; (c) End to End generation

Result

Comparative evaluation of pretrained and fine-tuned models on dialect speech-to-text and dialect text to standard text translation tasks.

!

1. Clone the Repository

git clone https://github.com/EncryptedBinary/BanglaDialecto.git
cd BanglaDialecto

Install Required Packages Run the following commands to install the necessary libraries:

!pip install transformers 
!pip install jiwer

🧪 Train-Test-Split

Training: 6270 samples
Validation: 810 samples
Testing: 120 samples

Feel free to modify the splits or experiment with different datasets based on your use case.

📚 Model Training

For those interested in fine-tuning the models further, we recommend checking out the train.py script, which includes hyperparameters and configurations for:

Epochs: All models are trained for 10 epochs, 16 batches for ASR, 25 epochs, and 6 batches for translation.
Loss Function:
Optimization:

References

[1] M. A. Al Amin, M. T. Islam, S. Kibria, and M. S. Rahman, “Continuous bengali speech recognition based on deep neural network,” in 2019 international conference on electrical, computer and communication engineering (ECCE). IEEE, 2019, pp. 1–6. (https://ieeexplore.ieee.org/document/8679341) [2] S. Khan, M. Pal, J. Basu, M. S. Bepari, and R. Roy, “Assessing performance of bengali speech recognizers under real world conditions using gmm-hmm and dnn based methods.” in SLTU, 2018, pp. 192–196.(https://www.researchgate.net/publication/328068468_Assessing_Performance_of_Bengali_Speech_Recognizers_Under_Real_World_Conditions_using_GMM-HMM_and_DNN_based_Methods) [3] A. M. Samin, M. H. Kobir, S. Kibria, and M. S. Rahman, “Deep learning based large vocabulary continuous speech recognition of an under-resourced language bangladeshi bangla,” Acoustical Science and Technology, vol. 42, no. 5, pp. 252–260, 2021 (https://www.jstage.jst.go.jp/article/ast/42/5/42_E2079/_article/-char/ja/) [4] P. R. Gudepu, G. P. Vadisetti, A. Niranjan, K. Saranu, R. Sarma, M. A. B. Shaik, and P. Paramasivam, “Whisper augmented end-to end/hybrid speech recognition system-cyclegan approach.” in INTERSPEECH, 2020, pp. 2302–2306. (https://www.isca-archive.org/interspeech_2020/gudepu20_interspeech.html)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ASRDataset		ASRDataset
CodeASR		CodeASR
TranslationCODE		TranslationCODE
datasetNewTranslation		datasetNewTranslation
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Abstract

Here is the available Noakhali Dialect Dataset(NDD)

Dataset Statistics

Methodology

Result

Comparative evaluation of pretrained and fine-tuned models on dialect speech-to-text and dialect text to standard text translation tasks.

1. Clone the Repository

🧪 Train-Test-Split

📚 Model Training

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

EncryptedBinary/BanglaDialecto

Folders and files

Latest commit

History

Repository files navigation

BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Abstract

Here is the available Noakhali Dialect Dataset(NDD)

Dataset Statistics

Methodology

Result

Comparative evaluation of pretrained and fine-tuned models on dialect speech-to-text and dialect text to standard text translation tasks.

1. Clone the Repository

🧪 Train-Test-Split

📚 Model Training

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages