An automatic speech recognition system in Odia language using attention mechanism and data augmentation

被引:0
作者
Malay Kumar Majhi [1 ]
Sujan Kumar Saha [1 ]
机构
[1] Department of CSE, National Institute of Technology Durgapur, Durgapur
关键词
Attention network; Automatic speech recognition; Continuous speech recognition; Data augmentation; Odia ASR;
D O I
10.1007/s10772-024-10132-6
中图分类号
学科分类号
摘要
This paper presents an automatic speech recognition (ASR) system developed for the Indian language Odia. In recent years, deep learning models have been used widely to develop ASR systems in various languages and domains. These models demand huge training resources, primarily annotated continuous speech utterances collected from various speakers. However, sufficient speech corpus is not available in many Indian languages. This paper explores the effectiveness of data augmentation in overcoming data scarcity in the Odia ASR task. The baseline system is developed using BiLSTM and the Seq2Seq framework. Next, a portion of the training data is selected based on phonetic richness, and certain augmentation techniques like pitch alteration and time stretching are applied. The augmented data is used along with the actual training data, and a substantial performance improvement is observed. The effectiveness of the attention mechanism in Odia ASR is also explored. When the system is trained through an attention layer embedded with the baseline BiLSTM model, it outperforms the baseline model and existing Odia ASR systems in the literature. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:717 / 728
页数:11
相关论文
共 29 条
  • [1] Aggarwal R.K., Dave M., Integration of multiple acoustic and language models for improved Hindi speech recognition system, International Journal of Speech Technology, 15, pp. 165-180, (2012)
  • [2] Aggarwal R.K., Dave M., Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system, Telecommunication Systems, 52, pp. 1457-1466, (2013)
  • [3] Alharbi S., Alrazgan M., Alrashed A., Alnomasi T., Almojel R., Alharbi R., Alharbi S., Alturki S., Alshehri F., Almojil M., Automatic speech recognition: Systematic literature review, IEEE Access, (2021)
  • [4] Ctc-Based End-To-End ASR for the Low Resource Sanskrit Language with Spectrogram Augmentation, (2021)
  • [5] Chadha H.S., Shah P., Dhuriya A., Chhimwal N., Gupta A., Raghavan V., Code switched and code mixed speech recognition for Indic languages, (2022)
  • [6] Das B., Mandal S., Mitra P., Bengali speech corpus for continuous automatic speech recognition system, In 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA), Hsinchu, Taiwan, pp. 51-55, (2011)
  • [7] Diwan A., Vaideeswaran R., Shah S., Singh A., Raghavan S., Khare S., Unni V., Vyas S., Rajpuria A., Yarra C., Mittal A., Ghosh P.K., Jyothi P., Bali K., Seshadri V., Sitaram S., Bharadwaj S., Nanavati J., Nanavati R., Sankaranarayanan K., MUCS 2021: Multilingual and code-switching ASR challenges for low resource Indian languages, Proceedings of Interspeech, 2021, pp. 2446-2450, (2021)
  • [8] Fathima N., Patel T., Mahima C., Iyengar A., TDNN-based multilingual speech recognition system for low resource Indian languages, In Interspeech, pp. 3197-3201, (2018)
  • [9] Karan B., Sahoo J., Sahu P.K., Automatic speech recognition based Odia system, Proceedings of the International Conference on Microwave, Optical and Communication Engineering (ICMOCE), Bhubaneswar, India, (2015)
  • [10] Klakow D., Jochen P., Testing the correlation of word error rate and perplexity, Speech Communication, 38, 1-2, pp. 19-28, (2002)