Harmonicity-based blind dereverberation for single-channel speech signals

被引:45
作者
Nakatani, Tomohiro [1 ]
Kinoshita, Keisuke [1 ]
Miyoshi, Masato [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 01期
关键词
adaptive harmonic filter; blind signal processing; dereverberation; harmonicity; inverse filter; speech signal; time warping;
D O I
10.1109/TASL.2006.872620
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The distant acquisition of acoustic signals in an enclosed space often produces reverberant artifacts due to the room impulse response. Speech dereverberation is desirable in situations where the distant acquisition of acoustic signals is involved. These situations include hands-free speech recognition, teleconferencing, and meeting recording, to name a few. This paper proposes a processing method, named Harmonicity-based dEReverBeration (HERB), to reduce the amount of reverberation in the signal picked up by a single microphone. The method makes extensive use of harmonicity, a unique characteristic of speech, in the design of a dereverberation filter. In particular, harmonicity enhancement is proposed and demonstrated as an effective way of estimating a filter that approximates an inverse filter corresponding to the room impulse response. Two specific harmonicity enhancement techniques are presented and compared; one based on an average transfer function and the other on the minimization of a mean squared error function. Prototype HERB systems are implemented by introducing several techniques to improve the accuracy of dereverberation filter estimation, including time warping analysis. Experimental results show that the proposed methods can achieve high-quality speech dereverberation, when the reverberation time is between 0.1 and 1.0 s, in terms of reverberation energy decay curves and automatic speech recognition accuracy.
引用
收藏
页码:80 / 95
页数:16
相关论文
共 26 条
[1]  
ABE T, 2003, P IEEE INT C AC SPEE
[2]  
Albert S. Bregman, 1990, AUDITORY SCENE ANAL, P411, DOI [DOI 10.7551/MITPRESS/1486.001.0001, 10.1121/1.408434, DOI 10.1121/1.408434]
[3]   EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].
ATAL, BS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312
[4]  
Buchner H, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS, P889
[5]  
Cohen L., 1995, TIME FREQUENCY ANAL
[6]   Convolutive blind separation of speech mixtures using the natural gradient [J].
Douglas, SC ;
Sun, XA .
SPEECH COMMUNICATION, 2003, 39 (1-2) :65-78
[7]  
Furuya K., 2001, P INT WORKSH HANDS F, P59
[8]  
Gillespie BW, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P676
[9]   Blind single channel deconvolution using nonstationary signal processing [J].
Hopgood, JR ;
Rayner, PJW .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :476-488
[10]  
HOUTGAST T, 1980, ACUSTICA, V46, P60