Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition

被引:27
作者
Borde, Prashant [1 ]
Varpe, Amarsinh [1 ]
Manza, Ramesh [2 ]
Yannawar, Pravin [1 ]
机构
[1] Dr Babasaheb Ambedkar Marathwada Univ, Dept Comp Sci & IT, Vis & Intelligent Syst Lab, Aurangabad, Maharashtra, India
[2] Dr Babasaheb Ambedkar Marathwada Univ, Dept Comp Sci & IT, Biomed Image Proc Lab, Aurangabad, Maharashtra, India
关键词
Lip tracking; Zernike moment; Principal component analysis (PCA); Mel frequency cepstral coefficients (MFCC);
D O I
10.1007/s10772-014-9257-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic speech recognition by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The objective of audio-visual speech recognition system is to improve recognition accuracy. In this paper we computed visual features using Zernike moments and audio feature using mel frequency cepstral coefficients on visual vocabulary of independent standard words dataset which contains collection of isolated set of city names of ten speakers. The visual features were normalized and dimension of features set was reduced by principal component analysis (PCA) in order to recognize the isolated word utterance on PCA space. The performance of recognition of isolated words based on visual only and audio only features results in 63.88 and 100 % respectively.
引用
收藏
页码:167 / 175
页数:9
相关论文
共 28 条
[1]  
Bishop C. M., 2006, PATTERN RECOGN
[2]  
Bradski G., 2008, LEARNING OPEN CV COM
[3]  
Capiler A., 2001, 11 INT C IM AN PROC
[4]  
Christopher B., 1993, IEEE, P361
[5]  
Deller J. R., 1993, DISCRETE TIME PROCES
[6]  
DUCHNOWSKI P, 1995, INT CONF ACOUST SPEE, P109, DOI 10.1109/ICASSP.1995.479285
[7]  
Finn K. I., 1986, THESIS
[8]  
Gold B., 2000, SPEECH AUDIO SIGNAL
[9]  
Hong X., 2006, INT C INT INF HID MU
[10]  
Hwang S-K., J PATTERN RECOGNITIO, DOI [10.1016/j.patcog.2006.03.004, DOI 10.1016/J.PATC0G.2006.03.004]