Filterbank optimization for robust ASR using GA and PSO

被引:20
作者
Aggarwal, R.K. [1 ]
Dave, M. [1 ]
机构
[1] Department of Computer Engineering, N.I.T., Kurukshetra
关键词
Automatic speech recognition; Genetic algorithm; Hidden Markov models; Hindi; MFCC; Multilayer perceptrons; Particle swarm optimization;
D O I
10.1007/s10772-012-9133-9
中图分类号
学科分类号
摘要
Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients (MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies. The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as in noisy environment. © Springer Science+Business Media, LLC 2012.
引用
收藏
页码:191 / 201
页数:10
相关论文
共 34 条
[21]  
Loizou P.C., Spanias A.S., High-performance alphabet recognition, IEEE Transactions on Speech and Audio Processing, 4, 6, pp. 430-445, (1996)
[22]  
Najkar N., Razzazi F., Sameti H., A novel approach to HMM-based speech recognition systems using particle swarm optimization, Mathematical and Computer Modelling, 52, pp. 1910-1920, (2010)
[23]  
Paliwal K.K., Basu Anjan, Speech enhancement method based on kalman filtering, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 177-180, (1987)
[24]  
Rabanal P., Rodriguez I., Rubio F., Applying river formation dynamics to solve NP-complete problems, Studies in Computational Intelligence: Nature-Inspired Algorithms for Optimization, 193, pp. 333-368, (2009)
[25]  
Rabiner L.R., A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE, 77, 2, pp. 257-286, (1989)
[26]  
Rao K.S., Yegnanarayana B., Modeling durations of syllables using neural networks, Computer Speech and Language, 21, 2, pp. 282-295, (2007)
[27]  
Rao K.S., Role of neural network models for developing speech systems, Sadhana, 36, 5, pp. 783-836, (2011)
[28]  
Shi Y., Eberhart R.C., Parameter selection in particle swarm optimization, Proceedings of Seventh Annual Conference on Evolutionary Programming, pp. 591-601, (1998)
[29]  
Skowronski M.D., Harris J.G., Improving the filterbank of a classic speech feature extraction algorithm, Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'03), 4, pp. 281-284, (2003)
[30]  
Skowronski M.D., Harris J.G., Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition, Journal of the Acoustical Society of America, 116, 3, pp. 1774-1780, (2004)