Automatic Speech Recognition: A survey of deep learning techniques and approaches

被引:0
作者
Ahlawat, Harsh [1 ]
Aggarwal, Naveen [1 ]
Gupta, Deepti [1 ]
机构
[1] University Institute of Engineering and Technology, Panjab University, Chandigarh
来源
International Journal of Cognitive Computing in Engineering | 2025年 / 6卷
关键词
Automatic Speech Recognition; Conformer; Datasets; Deep learning; Deep Neural Networks; Multilingual; Transformer;
D O I
10.1016/j.ijcce.2024.12.007
中图分类号
学科分类号
摘要
Significant research has been conducted during the last decade on the application of machine learning for speech processing, particularly speech recognition. However, in recent years, deep learning models have shown promising results for different speech related applications. With the emergence of end-to-end models, deep learning has revolutionized the field of Automatic Speech Recognition (ASR). A recent surge in transfer learning-based models and attention-based approaches on large datasets has further given an impetus to ASR. This paper provides a thorough review of the numerous studies conducted since 2010, as well as an extensive comparison of the state-of-the-art methods that are now being used in this research area, with a special focus on the numerous deep learning models, along with an analysis of contemporary approaches for both monolingual and multilingual models. Deep learning approaches are data dependent and their accuracy varies on different datasets. In this paper, we have also analyzed the various models on publicly accessible speech datasets to understand model performance across diverse datasets for practical deployment. This study also highlights the research findings and challenges with way forward that may be used as a beginning point for academicians interested in open-source Automatic Speech Recognition (ASR) research, particularly focusing on mitigating data dependency and generalizability across low resource languages, speaker variability, and noise conditions. © 2025 The Authors
引用
收藏
页码:201 / 237
页数:36
相关论文
共 195 条
[1]  
Al-Ghezi R., Getman Y., Voskoboinik E., Singh M., Kurimo M., Automatic rating of spontaneous speech for low-resource languages, 2022 IEEE spoken language technology workshop, pp. 339-345, (2023)
[2]  
Alam S., Sushmit A., Abdullah Z., Nakkhatra S., Ansary M., Hossen S.M., Et al., Bengali common voice speech dataset for automatic speech recognition, (2022)
[3]  
Aldarmaki H., Ullah A., Ram S., Zaki N., Unsupervised automatic speech recognition: A review, Speech Communication, (2022)
[4]  
Alharbi S., Alrazgan M., Alrashed A., Alnomasi T., Almojel R., Alharbi R., Et al., Automatic speech recognition: Systematic literature review, IEEE Access, 9, pp. 131858-131876, (2021)
[5]  
Amodei D., Ananthanarayanan S., Anubhai R., Bai J., Battenberg E., Case C., Et al., Deep speech 2: End-to-end speech recognition in English and Mandarin, International conference on machine learning, pp. 173-182, (2016)
[6]  
An K., Xiang H., Ou Z., CAT: A CTC-CRF based ASR toolkit bridging the hybrid and the end-to-end approaches towards data efficiency and low latency, (2020)
[7]  
Anastasopoulos A., Bojar O., Bremerman J., Et al., (2021)
[8]  
Anoop K., Pratik M., Pushpak B., Et al., (2018)
[9]  
Ansari E., Axelrod A., Bach N., Bojar O., Cattoni R., Dalvi F., Et al., (2020)
[10]  
Baevski A., Hsu W.-N., Xu Q., Babu A., Gu J., Auli M., Data2vec: A general framework for self-supervised learning in speech, vision and language, International conference on machine learning, pp. 1298-1312, (2022)