Automatic speech recognition systems: A survey of discriminative techniques

被引:6
作者
Kaur, Amrit Preet [1 ]
Singh, Amitoj [2 ]
Sachdeva, Rohit [3 ]
Kukreja, Vinay [4 ]
机构
[1] Punjabi Univ, Patiala, Punjab, India
[2] Jagat Guru Nanak Dev Punjab State Open Univ, Patiala, Punjab, India
[3] Multani Mal Modi Coll, Patiala, Punjab, India
[4] Chitkara Univ, Inst Engn & Technol, Rajpura, Punjab, India
关键词
Speech recognition; Acoustic modeling; Feature extraction; Deep learning; Speech processing; CONVOLUTIONAL NEURAL-NETWORKS; ACOUSTIC MODELS; SPEAKER; HINDI; ARCHITECTURE; LANGUAGES; ACCURATE; FEATURES;
D O I
10.1007/s11042-022-13645-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the subject of pattern recognition, speech recognition is an important study topic. The authors give a detailed assessment of voice recognition strategies for several majority languages in this study. Over the last several decades, many researchers have contributed to the field of voice processing and recognition. Although there are several frameworks for speech processing and recognition, there are only a few ASR systems available for language recognition throughout the world. However, the data gathered for this research reveals that the bulk of the effort has been done to construct ASR systems for majority languages, whereas minority languages suffer from a lack of standard speech corpus. We also looked at some of the key issues for voice recognition in various languages in this research. We have explored various kinds of hybrid acoustic modeling methods required for efficient results. Because the success of a classifier is dependent on the removal of information during the feature separation phase, it is critical to carefully pick the value extraction techniques and classifiers.
引用
收藏
页码:13307 / 13339
页数:33
相关论文
共 173 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]  
Abushariah M.A., 2010, Computer and Communication Engineering (ICCCE), 2010 International Conference on, P1, DOI DOI 10.1109/AERO.2010.5446970
[3]  
Al Mojaly M, 2014, I C COMP SYST APPLIC, P571, DOI 10.1109/AICCSA.2014.7073250
[4]  
Ali A, 2021, ARXIV
[5]  
Ali A, 2014, IEEE W SP LANG TECH, P525, DOI 10.1109/SLT.2014.7078629
[6]  
Amodei D, 2016, PR MACH LEARN RES, V48
[7]   Robust Arabic speech recognition in noisy environments using prosodic features and formant [J].
Amrous, Anissa ;
Debyeche, Mohamed ;
Amrouche, Abderrahman .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2011, 14 (04) :351-359
[8]  
Ardila R., 2019, arXiv
[9]  
Baccouche Moez, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5422, DOI 10.1109/ICASSP.2014.6854639
[10]   Integrating articulatory data in deep neural network-based acoustic modeling [J].
Badino, Leonardo ;
Canevari, Claudia ;
Fadiga, Luciano ;
Metta, Giorgio .
COMPUTER SPEECH AND LANGUAGE, 2016, 36 :173-195