Reverberation and Noise Robust Feature Compensation Based on IMM

被引:9
作者
Han, Chang Woo [1 ,2 ]
Kang, Shin Jae [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Sch Elect Engn, Seoul 151, South Korea
[2] Seoul Natl Univ, INMC, Seoul 151, South Korea
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 08期
基金
新加坡国家研究基金会;
关键词
Dereverberation; feature compensation; interacting multiple model (IMM); MAXIMUM-LIKELIHOOD; SPEECH; ADAPTATION; ALGORITHM;
D O I
10.1109/TASL.2013.2256893
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a novel feature compensation approach based on the interacting multiple model (IMM) algorithm specially designed for joint processing of background noise and acoustic reverberation. Our approach to cope with the time-varying environmental parameters is to establish a switching linear dynamic model for the additive and convolutive distortions, such as the background noise and acoustic reverberation, in the log-spectral domain. We construct multiple state space models with the speech corruption process in which the log spectra of clean speech and log frequency response of acoustic reverberation are jointly handled as the state of our interest. The proposed approach shows significant improvements in the Aurora-5 automatic speech recognition (ASR) task which was developed to investigate the influence on the performance of ASR for a hands-free speech input in noisy room environments.
引用
收藏
页码:1598 / 1611
页数:14
相关论文
共 27 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]  
[Anonymous], 2003, 201108 ETSI ES
[3]  
[Anonymous], 2007, THESIS TU EINDHOVEN
[4]   Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing [J].
Delcroix, Marc ;
Nakatani, Tomohiro ;
Watanabe, Shinji .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02) :324-334
[5]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]   Adaptive Kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model [J].
Deng, Li ;
Lee, Leo J. ;
Attias, Hagai ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01) :13-23
[7]  
Droppo J, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P953
[8]  
DROPPO J, 2001, P EUR, P217
[9]   Maximum likelihood linear transformations for HMM-based speech recognition [J].
Gales, MJF .
COMPUTER SPEECH AND LANGUAGE, 1998, 12 (02) :75-98
[10]   EVAM - AN EIGENVECTOR-BASED ALGORITHM FOR MULTICHANNEL BLIND DECONVOLUTION OF INPUT COLORED SIGNALS [J].
GURELLI, MI ;
NIKIAS, CL .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1995, 43 (01) :134-149