Use of spectral autocorrelation in spectral envelope linear prediction for speech recognition

被引:0
作者
Kim, HK [1 ]
Lee, HS [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Seoul 130012, South Korea
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1999年 / 7卷 / 05期
关键词
feature extraction; linear prediction; spectral autocorrelation; spectral envelope; speech recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a linear predictive (LP) analysis method where sample autocorrelations are estimated from the spectral envelope of speech signal on the basis of the spectral autocorrelation, The spectral autocorrelation is defined as discrete quantities of speech spectrum with spectral resolution identical to the discrete Fourier transform (DFT) used to obtain the speech spectrum, From analytical and empirical derivation of its properties, we can estimate the fundamental frequency and the maximally correlated frequency for voiced and unvoiced speech, respectively, and then obtain the spectral envelope by sampling at a rate of the estimated frequency. A frequency normalization can be applied to the estimated spectral envelope because the number of samples of the spectral envelope usually differs from frame to frame. The spectral envelope is warped into the mel-frequency scale and the inverse DFT is applied to extract the estimate of sample autocorrelations. From the result of LP analysis on the sample autocorrelations, we finally obtain the spectral envelope cepstral coefficients (SECC), Hidden Markov model (HMM) recognition experiments show that SECC significantly improves the performance of a recognizer at low signal-to-noise ratios (SNR's) over several other representations.
引用
收藏
页码:533 / 541
页数:9
相关论文
共 19 条
[1]  
[Anonymous], 1976, LINEAR PREDICTION SP
[2]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[3]  
ERELL A, 1991, P ICASSP TOR ONT CAN, P909
[4]   SPECTRAL SLOPE DISTANCE MEASURES WITH LINEAR PREDICTION ANALYSIS FOR WORD RECOGNITION IN NOISE [J].
HANSON, BA ;
WAKITA, H .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1987, 35 (07) :968-973
[5]   PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].
HERMANSKY, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752
[6]  
Hermansky H., 1983, Proceedings of ICASSP 83. IEEE International Conference on Acoustics, Speech and Signal Processing, P777
[7]  
JUANG BH, 1987, IEEE T ACOUST SPEECH, V35, P947, DOI 10.1109/TASSP.1987.1165237
[8]  
JUNQUA JC, 1989, P ICASSP GLASG UK MA, P25
[9]   A SPECTRAL AUTOCORRELATION METHOD FOR MEASUREMENT OF THE FUNDAMENTAL-FREQUENCY OF NOISE-CORRUPTED SPEECH [J].
LAHAT, M ;
NIEDERJOHN, RJ ;
KRUBSACK, DA .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1987, 35 (06) :741-750
[10]   THE SHORT-TIME MODIFIED COHERENCE REPRESENTATION AND NOISY SPEECH RECOGNITION [J].
MANSOUR, D ;
JUANG, BH .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (06) :795-804