A new approach for the adaptation of HMMs to reverberation and background noise

被引：43

作者：

Hirsch, Hans-Guenter ^{[1
]}

Finster, Harald ^{[1
]}

机构：

[1] Neiderrhein Univ Appl Sci, Dept Elect Engn & Comp Sci, D-47805 Krefeld, Germany

来源：

SPEECH COMMUNICATION | 2008年 / 50卷 / 03期

关键词：

robust speech recognition; HMM adaptation; hands-free speech input; reverberation;

D O I：

10.1016/j.specom.2007.09.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Looking at practical application scenarios of speech recognition systems several distortion effects exist that have a major influence on the speech signal and can considerably deteriorate the recognition performance. So far, mainly the influence of stationary background noise and of unknown frequency characteristics has been studied. A further distortion effect is the hands-free speech input in a reverberant room environment. A new approach is presented to adapt the energy and spectral parameters of HMMs as well as their time derivatives to the modifications by the speech input in a reverberant environment. The only parameter, needed for the adaptation, is an estimate of the reverberation time. The usability of this adaptation technique is shown by presenting the improvements for a series of recognition experiments on reverberant speech data. The approach for adapting the time derivatives of the acoustic parameters can be applied in general for all different types of distortions and is not restricted to the case of a hands-free input. The use of a hands-free speech input comes along with the recording of any background noise that is present in the room. Thus there exists the need of combining the adaptation to reverberant conditions with the adaptation to background noise and unknown frequency characteristics. A combined adaptation scheme for all mentioned effects is presented in this paper. The adaptation is based on an estimation of the noise characteristics before the beginning of speech is detected. The estimation of the distortion parameters is based on signal processing techniques. The applicability is demonstrated by showing the improvements on artificially distorted data as well as on real recordings in rooms. (c) 2007 Elsevier B.V. All rights reserved.

引用

页码：244 / 263

页数：20

共 41 条

[1]

[Anonymous], 2002, INTERSPEECH DENV US

[2]

Avendano C, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P889, DOI 10.1109/ICSLP.1996.607744

[3]

BITZER J, 1999, P ROB METH SPEECH RE, P171

[4]

CAMPOSNETO S, 1999, INT J SPEECH TECHNOL, P259

[5]

COUVREUR L, 2001, P INT WORKSH AD METH

[6]

*ETSI, 2003, 202 050 V113 ETSI ES

[7]

FINSTER H, 2005, WEB INTERFACE EXPERI

[8]

GADRUDADRI H, 2002, P ICSLP, P21

[9] Robust continuous speech recognition using parallel model combination [J].

Gales, MJF ;

Young, SJ .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (05) :352-359

[10]

GALES MJF, 1995, THESIS U CAMBRIDGE G

← 1 2 3 4 5 →