Temporal Modulation Spectral Restoration for Robust Speech Recognition

被引:0
作者
Wang, Svu-Siang [1 ]
Tsao, Yu [2 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
[2] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan
来源
2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM) | 2016年
关键词
temporal modulation spectral restoration; TMSR; noise estimation; generalized maximum a posteriori; NORMALIZATION; EQUALIZATION; NOISE;
D O I
10.1109/BigMM.2016.91
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a temporal modulation spectral resto-ration (TMSR) approach for robust feature extraction in automatic speech recognition. There were three main function blocks in TMSR. First, mean and variance normalization (CMVN) was applied to the original feature sequence. Second, the noise characteristic was estimated with an analysis of the normalized features. Third, a gain function was designed to attenuate noise and enhance speech components from the normalized features. In this study, a simple high-pass filter noise estimation scheme and a gain function derived by the generalized maximum a posteriori ( GMAP) algorithm were employed in TMSR. The proposed method was evaluated on two benchmark databases, Aurora-3 and Aurora-4. Results showed that TMSR outperformed the baseline and several well-known robust feature extraction methods.
引用
收藏
页码:481 / 486
页数:6
相关论文
共 27 条
[1]  
[Anonymous], 2001, SMALL VOCABULARY EVA
[2]  
Chen C.P., 2002, Proceedings of ICSLP, P2445
[3]   Histogram equalization of speech representation for robust speech recognition [J].
de la Torre, A ;
Peinado, AM ;
Segura, JC ;
Pérez-Córdoba, JL ;
Benítez, MC ;
Rubio, AJ .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03) :355-366
[4]  
Dharanipragada S., 2000, P ICSLP, V4, P556
[5]  
Hao-teng Fan, 2012, Proceedings of the 2012 International Conference on System Science and Engineering (ICSSE), P183, DOI 10.1109/ICSSE.2012.6257173
[6]   RASTA Processing of Speech [J].
Hermansky, Hynek ;
Morgan, Nelson .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :578-589
[7]  
HILGER F, 2001, P EUR, P1135
[8]  
Hirsch H.G., 2000, ISCA ITRW ASR2000 AU, P29, DOI [10.21437/ICSLP.2000-743, DOI 10.21437/ICSLP.2000-743]
[9]  
Hsu CW, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P197
[10]  
Hung JW, 2006, INT CONF ACOUST SPEE, P513