Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model

被引：42

作者：

Nakatani, Tomohiro ^{[1
]}

Juang, Biing-Hwang ^{[2
]}

Yoshioka, Takuya ^{[1
]}

Kinoshita, Keisuke ^{[1
]}

Delcroix, Marc ^{[1
]}

Miyoshi, Masato ^{[1
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan

[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2008年 / 16卷 / 08期

关键词：

Blind signal processing; dereverberation; maximum-likelihood (ML) estimation; multichannel linear prediction; speech; time-varying Gaussian source model (TVGSM);

D O I：

10.1109/TASL.2008.2004306

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Distant acquisition of acoustic signals in an enclosed space often produces reverberant components due to acoustic reflections in the room. Speech dereverberation is in general desirable when the signal is acquired through distant microphones in such applications as hands-free speech recognition, teleconferencing, and meeting recording. This paper proposes a new speech dereverberation approach based on a statistical speech model. A time-varying Gaussian source model (TVGSM) is introduced as a model that represents the dynamic short time characteristics of nonreverberant speech segments, including the time and frequency structures of the speech spectrum. With this model, dereverberation of the speech signal is formulated as a maximum-likelihood (ML) problem based on multichannel linear prediction, in which the speech signal is recovered by transforming the observed signal into one that is probabilistically more like nonreverberant speech. We first present a general ML solution based on TVGSM, and derive several dereverberation algorithms based on various source models. Specifically, we present a source model consisting of a finite number of states, each of which is manifested by a short time speech spectrum, defined by a corresponding autocorrelation (AC) vector. The dereverberation algorithm based on this model involves a finite collection of spectral patterns that form a codebook. We confirm experimentally that both the time and frequency characteristics represented in the source models are very important for speech dereverberation, and that the prior knowledge represented by the codebook allows us to further improve the dereverberated speech quality. We also confirm that the quality of reverberant speech signals can be greatly improved in terms of the spectral shape and energy time-pattern distortions from simply a short speech signal using a speaker-independent codebook.

引用

页码：1512 / 1527

页数：16

共 33 条

[1] Prediction error method for second-order blind identification [J].

AbedMeraim, K ;

Moulines, E ;

Loubaton, P .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (03) :694-705

[2]

Arestis P., 2005, 0505 CEPP U CAMBR

[3] Precise dereverberation using multichannel linear prediction [J].

Delcroix, Marc ;

Hikichi, Takafumi ;

Miyoshi, Masato .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02) :430-440

[4] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[5]

Elko GW, 2000, SPRING INT SER ENG C, V551, P181

[6]

FEVOTTE C, 2005, P IEEE WASPAA 05, P78

[7] COMPUTER-STEERED MICROPHONE ARRAYS FOR SOUND TRANSDUCTION IN LARGE ROOMS [J].

FLANAGAN, JL ;

JOHNSTON, JD ;

ZAHN, R ;

ELKO, GW .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1985, 78 (05) :1508-1518

[8]

Furui S., 2001, DIGITAL SPEECH PROCE, V2nd

[9]

GANNOT G, 1997, EURASIP J APPL SIG A, V80, P804

[10]

Gillespie BW, 2001, INT CONF ACOUST SPEE, P3701, DOI 10.1109/ICASSP.2001.940646

← 1 2 3 4 →