A Closer Look on Hierarchical Spectro-Temporal Features (HIST)

被引:0
作者
Heckmann, Martin [1 ]
Domont, Xavier [1 ]
Joublin, Frank [1 ]
Goerick, Christian [1 ]
机构
[1] Honda Res Inst Europe GmbH, D-63073 Offenbach, Germany
来源
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | 2008年
关键词
Spectro-temporal; auditory; robust speech recognition; non-linear smoothing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition robust against interfering noise remains a difficult task. We previously presented a set of spectro-temporal speech features which we termed Hierarchical Spectro-Temporal (HIST) features showing improved robustness, especially when combined with RASTA-PLP. They are inspired by the receptive fields found in the mammalian auditory cortex and are organized in two hierarchical levels. A set of filters learned via ICA captures local variations and constitutes the first layer of the hierarchy. In the second layer these local variations are combined to form larger receptive fields learned via Non Negative Sparse Coding. In this paper we introduce a non-linear smoothing along the time axis of the spectrograms at the input to the hierarchy and, additionally, a more thorough performance analysis on an isolated and a continuous digit recognition task. The results show that the combination of HIST and RASTA-PLP features yields improved recognition scores in noise.
引用
收藏
页码:894 / 897
页数:4
相关论文
共 14 条
[1]  
Cheng N, 2002, MOL CANCER RES, V1, P2
[2]  
DOMONT X, 2008, P ICASSP LAS VEG US, P4417
[3]  
HECKMANN M, 2003, ADAPTIVE DATENFUSION
[4]   RASTA Processing of Speech [J].
Hermansky, Hynek ;
Morgan, Nelson .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :578-589
[5]  
HIRSCH G, FANTFILTERING NOISE
[6]  
Hoyer PO, 2004, J MACH LEARN RES, V5, P1457
[7]  
KLEINSCHMIDT M, 2002, THESIS U OLDENBURG
[8]  
LEONARD R, 1984, P IEEE INT C AC SPEE, V9
[9]   Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations [J].
Mesgarani, N ;
Slaney, M ;
Shamma, SA .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03) :920-930
[10]  
PEARCE D, 2000, INT C SPOK LANG P IS