Skip to content

EncryptedBinary/BanglaDialecto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Official Implementation for IEEE BigData 2024 Submission

Abstract

This project focuses on recognizing Bangladeshi dialects and converting diverse Bengali accents into standardized formal Bengali speech. No notable works have been done accurately in Bangla dialects due to a shortage of diverse and large datasets and relying on traditional approaches. Our approach developed a large dataset of dialectal speech signals for fine-tuning LLMs for two tasks, one is dialect speech recognition, and the other one is translating the dialect text to standard Bangla text. Our whisper fine-tuned model achieved a CER of 0.8% and a WER of 1.5%. In the case of translation, the BanglaT5 model attained a BLEU score of 41.6% for dialect-to-standard text translation. Finally, by utilizing AlignTTS, we completed our end-to-end pipeline for dialect standardization.

Dataset Statistics

2024-09-10 --

Methodology

1.Data Preprocessing: Converting audio input signal into wav form then it undergoes the process of noise reduction and splitting into manageable 5-second speech segments; similarly, dialect text and standard text are segmented into corresponding chunks.

2.Fine-Tuning: Dialect speeches and dialect texts are used to fine-tune LLM for transcript speech and other LLM has been finetuned for MT from dialect text to standard Bangla text. Finally, AlignTTS has been used for generating standard Bangla speech signal from translated standard Bangla text.

3.Evaluation: Cer, wer, bleu scores are used to measure models performances.

!2024-09-10 (2) Fig. 1: (a) Data preprossing [1]; (b) Fine tuning LLMs [2]; (c) End to End generation

Result

Comparative evaluation of pretrained and fine-tuned models on dialect speech-to-text and dialect text to standard text translation tasks.

!2024-09-10 (3)

1. Clone the Repository

git clone https://github.com/EncryptedBinary/BanglaDialecto.git
cd BanglaDialecto
  1. Install Required Packages Run the following commands to install the necessary libraries:
!pip install transformers 
!pip install jiwer

🧪 Train-Test-Split

  • Training: 6270 samples
  • Validation: 810 samples
  • Testing: 120 samples

Feel free to modify the splits or experiment with different datasets based on your use case.

📚 Model Training

For those interested in fine-tuning the models further, we recommend checking out the train.py script, which includes hyperparameters and configurations for:

  • Epochs: All models are trained for 10 epochs, 16 batches for ASR, 25 epochs, and 6 batches for translation.
  • Loss Function:
  • Optimization:

References

[1] M. A. Al Amin, M. T. Islam, S. Kibria, and M. S. Rahman, “Continuous bengali speech recognition based on deep neural network,” in 2019 international conference on electrical, computer and communication engineering (ECCE). IEEE, 2019, pp. 1–6. (https://ieeexplore.ieee.org/document/8679341) [2] S. Khan, M. Pal, J. Basu, M. S. Bepari, and R. Roy, “Assessing performance of bengali speech recognizers under real world conditions using gmm-hmm and dnn based methods.” in SLTU, 2018, pp. 192–196.(https://www.researchgate.net/publication/328068468_Assessing_Performance_of_Bengali_Speech_Recognizers_Under_Real_World_Conditions_using_GMM-HMM_and_DNN_based_Methods) [3] A. M. Samin, M. H. Kobir, S. Kibria, and M. S. Rahman, “Deep learning based large vocabulary continuous speech recognition of an under-resourced language bangladeshi bangla,” Acoustical Science and Technology, vol. 42, no. 5, pp. 252–260, 2021 (https://www.jstage.jst.go.jp/article/ast/42/5/42_E2079/_article/-char/ja/) [4] P. R. Gudepu, G. P. Vadisetti, A. Niranjan, K. Saranu, R. Sarma, M. A. B. Shaik, and P. Paramasivam, “Whisper augmented end-to end/hybrid speech recognition system-cyclegan approach.” in INTERSPEECH, 2020, pp. 2302–2306. (https://www.isca-archive.org/interspeech_2020/gudepu20_interspeech.html)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •