Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I)

被引:14
作者
Aggarwal, Rajesh [1 ]
Dave, Mayank [1 ]
机构
[1] Natl Inst Technol, Kurukshetra, Haryana, India
关键词
Acoustic models; ASR; HMM; Gaussian mixtures; Front end; Back end;
D O I
10.1007/s10772-011-9108-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In automatic speech recognition (ASR) systems, the speech signal is captured and parameterized at front end and evaluated at back end using the statistical framework of hidden Markov model (HMM). The performance of these systems depend critically on both the type of models used and the methods adopted for signal analysis. Researchers have proposed a variety of modifications and extensions for HMM based acoustic models to overcome their limitations. In this review, we summarize most of the research work related to HMM-ASR which has been carried out during the last three decades. We present all these approaches under three categories, namely conventional methods, refinements and advancements of HMM. The review is presented in two parts (papers): (i) An overview of conventional methods for acoustic phonetic modeling, (ii) Refinements and advancements of acoustic models. Part I explores the architecture and working of the standard HMM with its limitations. It also covers different modeling units, language models and decoders. Part II presents a review on the advances and refinements of the conventional HMM techniques along with the current challenges and performance issues related to ASR.
引用
收藏
页码:297 / 308
页数:12
相关论文
共 76 条
[1]  
[Anonymous], 2003, SPEECH PROCESSING DY
[2]   Front end analysis of speech recognition: a review [J].
Anusuya, M. ;
Katti, S. .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2011, 14 (02) :99-145
[3]  
Aubert X., 1993, ICASSP-93. 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing (Cat. No.92CH3252-4), P648, DOI 10.1109/ICASSP.1993.319393
[4]   An overview of decoding techniques for large vocabulary continuous speech recognition [J].
Aubert, XL .
COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01) :89-114
[5]  
Bakis R., 1976, P ASA M WASH
[6]   AN INEQUALITY WITH APPLICATIONS TO STATISTICAL ESTIMATION FOR PROBABILISTIC FUNCTIONS OF MARKOV PROCESSES AND TO A MODEL FOR ECOLOGY [J].
BAUM, LE ;
EAGON, JA .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1967, 73 (03) :360-&
[7]   TIED MIXTURE CONTINUOUS PARAMETER MODELING FOR SPEECH RECOGNITION [J].
BELLEGARDA, JR ;
NAHAMOO, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1990, 38 (12) :2033-2045
[8]   Dynamic programming search techniques for across-word modelling in speech recognition [J].
Beulen, K ;
Ortmanns, S ;
Elting, C .
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :609-612
[9]   Graphical model architectures for speech recognition [J].
Bilmes, JA ;
Bartels, C .
IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) :89-100
[10]   Buried Markov models: a graphical-modeling approach to automatic speech recognition [J].
Bilmes, JA .
COMPUTER SPEECH AND LANGUAGE, 2003, 17 (2-3) :213-231