Automatic speech recognition: a survey

被引:175
作者
Malik, Mishaim [1 ]
Malik, Muhammad Kamran [1 ]
Mehmood, Khawar [2 ]
Makhdoom, Imran [3 ]
机构
[1] Punjab Univ Coll Informat Technol PUCIT, Lahore, Pakistan
[2] Univ New South Wales UNSW Canberra, ADFA, Sch Engn & Informat Technol, Canberra, ACT, Australia
[3] Univ Technol Sydney, Fac Engn & IT, Ultimo, Australia
关键词
Speech recognition; ASR; Automatic speech recognition; Feature extraction; Classification models; Language models; HIDDEN MARKOV-MODELS; SUPPORT VECTOR MACHINES; NEURAL-NETWORKS; EXTRACTION; OPTIMIZATION; COMBINATION; FEATURES;
D O I
10.1007/s11042-020-10073-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently great strides have been made in the field of automatic speech recognition (ASR) by using various deep learning techniques. In this study, we present a thorough comparison between cutting-edged techniques currently being used in this area, with a special focus on the various deep learning methods. This study explores different feature extraction methods, state-of-the-art classification models, and vis-a-vis their impact on an ASR. As deep learning techniques are very data-dependent different speech datasets that are available online are also discussed in detail. In the end, the various online toolkits, resources, and language models that can be helpful in the formulation of an ASR are also proffered. In this study, we captured every aspect that can impact the performance of an ASR. Hence, we speculate that this work is a good starting point for academics interested in ASR research.
引用
收藏
页码:9411 / 9457
页数:47
相关论文
共 179 条
[1]  
Abdulla W., 1999, The Concepts of Hidden Markov Model in Speech Recognition
[2]  
Alkhaldi W, 2002, 2002 45TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL I, CONFERENCE PROCEEDINGS, P463
[3]  
[Anonymous], 2019, SYMMETRY BASEL, DOI DOI 10.3390/sym11050644
[4]  
[Anonymous], 1992, Wavelets and Their Applications
[5]  
[Anonymous], 2011, 2011 NAT C COMM NCC
[6]  
[Anonymous], 2007, ROCLING 2007 Poster Papers
[7]  
[Anonymous], 2003, THYROID
[8]  
[Anonymous], 1996, Technical Report, Statistics Department
[9]  
[Anonymous], 2016, Wav2Letter: An End-to-End ConvNet-based Speech Recognition System
[10]  
[Anonymous], 1994, SST C ASSTA INC PERT