ROBUST SPECTRO-TEMPORAL FEATURES BASED ON AUTOREGRESSIVE MODELS OF HILBERT ENVELOPES

被引:10
作者
Ganapathy, Sriram [1 ]
Thomas, Samuel [1 ]
Hermansky, Hynek [1 ]
机构
[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Baltimore, MD 21218 USA
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年
关键词
Frequency domain linear prediction (FDLP); Hilbert Envelopes; Robust spectro-temporal features; Phoneme recognition;
D O I
10.1109/ICASSP.2010.5495668
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a robust spectro-temporal feature extraction technique using autoregressive models (AR) of sub-band Hilbert envelopes. AR models of Hilbert envelopes are derived using frequency domain linear prediction (FDLP). From the sub-band Hilbert envelopes, spectral features are derived by integrating these envelopes in short-term frames and the temporal features are formed by converting these envelopes into modulation frequency components. The spectral and temporal feature streams are then combined at the phoneme posterior level and are used as the input features for a recognition system. For the proposed features, robustness is achieved by using novel techniques of noise compensation and gain normalization. Phoneme recognition experiments on telephone speech in the HTIMIT database show signicant performance improvements for the proposed features when compared to other robust feature techniques (average relative reduction of 10.6 % in phoneme error rate). In addition to the overall phoneme recognition rates, the performance with broad phonetic classes is also reported.
引用
收藏
页码:4286 / 4289
页数:4
相关论文
共 17 条
[1]  
[Anonymous], 1994, Connectionist Speech Recognition: A Hybrid Approach
[2]  
[Anonymous], 2002, ETSI ES
[3]   Autoregressive modeling of temporal envelopes [J].
Athineos, Marios ;
Ellis, Daniel P. W. .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (11) :5237-5245
[4]   EFFECT OF REDUCING SLOW TEMPORAL MODULATIONS ON SPEECH RECEPTION [J].
DRULLMAN, R ;
FESTEN, JM ;
PLOMP, R .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (05) :2670-2680
[5]   PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].
HERMANSKY, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752
[6]  
Hermansky H., 2005, Proci of Inter speech 2005, P361
[7]   Robust speech recognition using the modulation spectrogram [J].
Kingsbury, BED ;
Morgan, N ;
Greenberg, S .
SPEECH COMMUNICATION, 1998, 25 (1-3) :117-132
[8]   Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications [J].
Kumaresan, R ;
Rao, A .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 105 (03) :1912-1924
[9]   LINEAR PREDICTION - TUTORIAL REVIEW [J].
MAKHOUL, J .
PROCEEDINGS OF THE IEEE, 1975, 63 (04) :561-580
[10]  
Marple SL, 1999, IEEE T SIGNAL PROCES, V47, P2600, DOI 10.1109/78.782222