An End-to-End Continuous Speech Recognition System in Bengali for General and Elderly Domain

被引:0
作者
Shubhojeet Paul [1 ]
Vandana Bhattacharjee [1 ]
Sujan Kumar Saha [2 ]
机构
[1] Birla Institute of Technology,Department of Computer Science and Engg.
[2] Mesra,Department of Computer Science and Engg.
[3] National Institute of Technology,undefined
[4] Durgapur,undefined
关键词
Automatic speech recognition; Deep neural networks; Low resource; Elder speaker; Bengali;
D O I
10.1007/s42979-025-04058-2
中图分类号
学科分类号
摘要
Although a substantial amount of research has been carried out on the development of Automatic Speech Recognition (ASR) in Bengali, we did not find any open ASR system that works well when tested with utterances of elderly people. Developing a new system for the elderly domain from scratch demands a huge amount of training resources. However, such domain-specific Bengali ASR resources are not available, and the creation of sufficient resources is costly and time-consuming. In this paper, we investigate the efficiency of transfer learning where we used a small domain-specific in-house dataset along with available general-domain open resources. First, we develop a CNN-BiGRU network using the openSLR data for the generic domain Bengali. Then we use the network in a transfer learning architecture where only 5 h of elderly data is fed. With the existing open general domain data, the system proposed a CER of 7.72%. When the same system was tested using the elderly data, the CER was reduced to 19.46%. Then we used the proposed transfer learning framework, and the CER is improved to 12.37%. In our experiments, we found that the proposed model outperforms the existing systems tested on the elderly domain. This improvement demonstrates the effectiveness of transfer learning for developing ASR systems in languages or domains where sufficient training resources are not available.
引用
收藏
相关论文
empty
未找到相关数据