A COMPARISON OF SIGNAL-PROCESSING FRONT-ENDS FOR AUTOMATIC WORD RECOGNITION

被引:62
作者
JANKOWSKI, CR
VO, HDH
LIPPMANN, RP
机构
[1] Lincoln Laboratory and the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Lexington
[2] Lincoln Laboratory and the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, PCSI Inc., Lexington
[3] Lincoln Laboratory, Massachusetts Institute of Technology, Lexington
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1995年 / 3卷 / 04期
关键词
D O I
10.1109/89.397093
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter bank (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, using the TI-105 isolated word database. MFB recognition error rates ranged from 0.5 to 26.9% in noise, depending on the SNR, and auditory models provided error rates as much as four percentage points lower. With speech degraded by linear filtering, MFB error rates ranged from 0.5 to 3.1%, and the reduction in error rates provided by auditory models was less than 0.5 percentage points. Some earlier studies that demonstrated considerably more improvement with auditory models used linear predictive coding (LPC) based control front ends. This paper shows that MFB cepstra significantly outperform LPC cepstra under noisy conditions. Techniques using an optimal linear combination of features for data reduction were also evaluated.
引用
收藏
页码:286 / 293
页数:8
相关论文
共 21 条
[1]  
Davis S.B., Mermelstein P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust., ASSP-28, 4, pp. 357-366, (1980)
[2]  
Ghitza O., Auditory nerve representation as a basis for speech processing, Advances in Speech Signal Processing (S. Furui and M. M. Sondhi Eds.)., pp. 453-486, (1992)
[3]  
Seneff S., A computational model for the peripheral auditory system: Application to speech recognition research, Proc. Int. Conf. Acoust., pp. 1983-1986, (1986)
[4]  
Stern R.M., Liu F.-H., Ohshima Y., Sullivan T.M., Acero A., Multiple approaches to robust speech recognition, Proc. DARPA Speech Natural Language Workshop, pp. 274-279, (1992)
[5]  
Hunt M.J., Lefebvre C., A comparison of several acoustic representations for speech recognition with degraded and undegraded speech, Proc. Int. Conf. Acoust., pp. 262-265, (1989)
[6]  
Delgutte B., Kiang N.Y., Speech coding in the auditory nerve, J. Acoust. Soc. Amer., 75, pp. 866-919, (1984)
[7]  
Sachs M.B., Young E.D., Encoding of steady state vowels in the auditory nerve: Representation in terms of discharge rate, J. Acoust. Soc. Amer., 66, pp. 470-479, (1979)
[8]  
Siebert W.M., Frequency discrimination in the auditory system: Place or periodicity mechanism?, Proc. IEEE, 58, pp. 723-730, (1970)
[9]  
Strang G., Linear Algebra and its Applications. San Diego: Harcourt Brace Jovanovich, (1988)
[10]  
Fukunaga K., Introduction to Statistical Pattern Recognition. New York: Academic, (1972)