Filterbank optimization for robust ASR using GA and PSO

被引:20
作者
Aggarwal, R.K. [1 ]
Dave, M. [1 ]
机构
[1] Department of Computer Engineering, N.I.T., Kurukshetra
关键词
Automatic speech recognition; Genetic algorithm; Hidden Markov models; Hindi; MFCC; Multilayer perceptrons; Particle swarm optimization;
D O I
10.1007/s10772-012-9133-9
中图分类号
学科分类号
摘要
Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients (MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies. The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as in noisy environment. © Springer Science+Business Media, LLC 2012.
引用
收藏
页码:191 / 201
页数:10
相关论文
共 34 条
[1]  
Aggarwal R.K., Dave M., Performance evaluation of sequentially combined heterogeneous feature streams for hindi speech recognition system, Telecommunication Systems Journal, (2011)
[2]  
Aggarwal R.K., Dave M., Acoustic modeling problem for automatic speech recognition system: Conventional methods (part I), International Journal of Speech Technology, 14, 4, pp. 297-308, (2011)
[3]  
Aggarwal R.K., Dave M., Acoustic modeling problem for automatic speech recognition system: Advances and refinements (Part II), International Journal of Speech Technology, 14, 4, pp. 309-320, (2011)
[4]  
Benesty J., Sondhi M.M., Huang Y., Handbook of Speech Processing, (2008)
[5]  
Boll S.F., Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, pp. 113-120, (1979)
[6]  
Burget L., Hermansky H., Data driven design of filter bank for speech recognition, Lecture Notes in Computer Science, 2166, pp. 299-304, (2001)
[7]  
Chau C.W., Kwong S., Diu C.K., Fahrner W.R., Optimization of HMM by a genetic algorithm, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1727-1730, (1997)
[8]  
Chen J., Benesty J., Huang Y., Doclo S., New insights into the noise reduction wiener filter, IEEE Transactions on Audio, Speech, & Language Processing, 14, 4, pp. 1218-1234, (2006)
[9]  
Davis Steven B., Mermelstein Paul, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-28, 4, pp. 357-366, (1980)
[10]  
Dorigo M., Gambardella L.M., Ant colony system: A cooperative learning approach to the traveling salesman problem, IEEE Transactions on Evolutionary Computation, 1, 1, pp. 53-56, (1997)