A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers

被引:20
作者
Kadyan V. [1 ]
Mantri A. [2 ]
Aggarwal R.K. [3 ]
机构
[1] Department of Computer Science & Engineering, Chitkara University Institute of Engineering & Technology, Chitkara University, Punjab
[2] Department of Electronics & Communication Engineering, Chitkara University Institute of Engineering & Technology, Chitkara University, Punjab
[3] Department of Computer Engineering, N.I.T., Kurukshetra, Haryana
关键词
Automatic speech recognition (ASR); Differential evolution (DE); Genetic algorithm (GA); Hidden markov model (HMM); Mel-frequency cepstral coefficients (MFCC); Perceptual linear prediction (PLP); Relative spectral transform (RASTA);
D O I
10.1007/s10772-017-9446-9
中图分类号
学科分类号
摘要
Automatic speech recognition (ASR) system plays a vital role in the human–machine interaction. ASR system faces the challenge of performance degradation due to inconsistency between training and testing phases. This occurs due to extraction and representation of erroneous, redundant feature vectors. This paper proposes three different combinations at speech feature vector generation phase and two hybrid classifiers at modeling phase. In feature extraction phase MFCC, RASTA-PLP, and PLP are combined in different ways. In modeling phase, the mean and variance are calculated to generate the inter and intra class feature vectors. These feature vectors are further adopted by optimization algorithm to generate refined feature vectors with traditional statistical technique. This approach uses GA + HMM and DE + HMM techniques to produce refine model parameters. The experiments are conducted on datasets of large vocabulary isolated Punjabi lexicons. The simulation result shows the performance improvement using MFCC and DE + HMM technique when compared with RASTA-PLP, PLP using hybrid HMM classifiers. © 2017, Springer Science+Business Media, LLC.
引用
收藏
页码:761 / 769
页数:8
相关论文
共 27 条
[1]  
Aggarwal R.K., Dave M., Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system, Telecommunication Systems, 52, 3, pp. 1457-1466, (2013)
[2]  
Alam M.J., Kenny P., Dumouchel P., O'Shaughnessy D., Robust feature extractors for continuous speech recognition. 22nd European Signal Processing Conference, (EUSIPCO), pp. 944–948, (2014)
[3]  
Alam M.J., Kinnunen T., Kenny P., Ouellet P., O'Shaughnessy D., Multitaper MFCC and PLP features for speaker verification using i-vectors, Speech Communication, 55, 2, pp. 237-251, (2013)
[4]  
Bengio Y., Grandvalet Y., No unbiased estimator of the variance of k-fold cross-validation, Journal of Machine Learning Research, 5, pp. 1089-1105, (2004)
[5]  
Chang E.I., Lippmann R., Tong D.W., Using genetic algorithms to improve pattern classification performance, NIPS, pp. 797-803, (1990)
[6]  
Clemente I.A., Heckmann M., Wrede B., Incremental word learning: Efficient hmm initialization and large margin discriminative adaptation, Speech Communication, 54, 9, pp. 1029-1048, (2012)
[7]  
Davis S., Mermelstein P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, pp. 357-366, (1980)
[8]  
Dua M., Aggarwal R.K., Kadyan V., Dua S., Punjabi speech to text system for connected words, Fourth International Conference on Advances in Recent Technologies in Communication and Computing, pp. 206-209, (2012)
[9]  
Dua M., Aggarwal R.K., Kadyan V., Dua S., Punjabi automatic speech recognition using HTK, International Journal of Computer Science Issues, 9, 4, pp. 359-364, (2012)
[10]  
Ganapathiraju A., Support vector machines for speech recognition, (2002)