Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system

被引:18
作者
Aggarwal, R. K. [1 ]
Dave, M. [1 ]
机构
[1] NIT, Dept Comp Engn, Kurukshetra, Haryana, India
关键词
ASR; HMM; MFCC; PLP; RASTA; Gaussian mixtures; Gravity centroids; Hindi; REPRESENTATIONS; COEFFICIENTS; HMMS; PLP;
D O I
10.1007/s11235-011-9623-0
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
State-of-the-art automatic speech recognition (ASR) systems follow a well established statistical paradigm, that of parameterization of speech signals (a.k.a. feature extraction) at front-end and likelihood evaluation of feature vectors at back-end. For feature extraction, Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) are the two dominant signal processing methods, which have been used mainly in ASR. Although the effects of both techniques have been analyzed individually, it is not known whether any combination of the two can produce an improvement in the recognition accuracy or not. This paper presents an investigation on the possibility to integrate different types of features such as MFCC, PLP and gravity centroids to improve the performance of ASR in the context of Hindi language. Our experimental results show a significant improvement in case of such few combinations when applied to medium size lexicons in typical field conditions.
引用
收藏
页码:1457 / 1466
页数:10
相关论文
共 31 条
[1]  
[Anonymous], ELRAW0037
[2]  
[Anonymous], 2003, IEEE Inf. Theory Soc. Newsl
[3]   AN INEQUALITY WITH APPLICATIONS TO STATISTICAL ESTIMATION FOR PROBABILISTIC FUNCTIONS OF MARKOV PROCESSES AND TO A MODEL FOR ECOLOGY [J].
BAUM, LE ;
EAGON, JA .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1967, 73 (03) :360-&
[4]   Automatic speech recognition and speech variability: A review [J].
Benzeghiba, M. ;
De Mori, R. ;
Deroo, O. ;
Dupont, S. ;
Erbes, T. ;
Jouvet, D. ;
Fissore, L. ;
Laface, P. ;
Mertins, A. ;
Ris, C. ;
Rose, R. ;
Tyagi, V. ;
Wellekens, C. .
SPEECH COMMUNICATION, 2007, 49 (10-11) :763-786
[5]   Discriminative model combination [J].
Beyerlein, P .
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, :238-245
[6]   Analysis of feature extraction and channel compensation in a GMM speaker recognition system [J].
Burget, Lukas ;
Matejka, Pavel ;
Schwarz, Petr ;
Glembek, Ondfei ;
Cernocky, Jan 'Honza' .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :1979-1986
[7]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[8]  
DIGALAKIS V, 1994, INT CONF ACOUST SPEE, P537
[9]   A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER) [J].
Fiscus, JG .
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, :347-354
[10]   VITERBI ALGORITHM [J].
FORNEY, GD .
PROCEEDINGS OF THE IEEE, 1973, 61 (03) :268-278