Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization

被引：0

作者：

Ueda, Yuma ^{[1
]}

Wang, Longbiao ^{[2
]}

Kai, Atsuhiko ^{[1
]}

Xiao, Xiong ^{[3
]}

Chng, Eng Siong ^{[3
]}

Li, Haizhou ^{[4
]}

机构：

[1] Shizuoka Univ, Grad Sch Engn, Hamamatsu, Shizuoka 4328561, Japan

[2] Nagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan

[3] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore

[4] Inst Infocomm Res, Human Language Technol, Singapore, Singapore

来源：

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年

关键词：

speech recognition; dereverberation; denoising autoencoder; environment adaptation; distant-talking speech; SPECTRAL SUBTRACTION; REVERBERATION; ADAPTATION; DOMAIN; MODEL;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a robust distant-talking speech recognition by combining cepstral domain denoising autoencoder (DAE) and temporal structure normalization (TSN) filter. For the proposed method, after applying a DAE in the cepstral domain of speech to suppress reverberation, we apply a postprocessing technology based on temporal structure normalization (TSN) filter to reduce the noise and reverberation effects by normalizing the modulation spectra to reference spectra of clean speech. The proposed method was evaluated using speech in simulated and real reverberant environments. By combining a cepstral-domain DAE and TSN, the average Word Error Rate (WER) was reduced from 25.2% of the baseline system to 21.2% in simulated environments and from 47.5% to 41.3% in real environments, respectively.

引用

页码：379 / +

页数：3

共 31 条

[1] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].

BOLL, SF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120

[2] Precise dereverberation using multichannel linear prediction [J].

Delcroix, Marc ;

Hikichi, Takafumi ;

Miyoshi, Masato .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02) :430-440

[3] CEPSTRAL ANALYSIS TECHNIQUE FOR AUTOMATIC SPEAKER VERIFICATION [J].

FURUI, S .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1981, 29 (02) :254-272

[4] Mean and variance adaptation within the MLLR framework [J].

Gales, MJF ;

Woodland, PC .

COMPUTER SPEECH AND LANGUAGE, 1996, 10 (04) :249-264

[5]

Habets EAP, 2005, INT CONF ACOUST SPEE, P173

[6] Reducing the dimensionality of data with neural networks [J].

Hinton, G. E. ;

Salakhutdinov, R. R. .

SCIENCE, 2006, 313 (5786) :504-507

[7] A new approach for the adaptation of HMMs to reverberation and background noise [J].

Hirsch, Hans-Guenter ;

Finster, Harald .

SPEECH COMMUNICATION, 2008, 50 (03) :244-263

[8]

Ishii T., 2013, INTERSPEECH, P3512

[9]

Itou K., 1999, Journal of the Acoustical Society of Japan (E), V20, P199, DOI 10.1250/ast.20.199

[10] Far-field speaker recognition [J].

Jin, Qin ;

Schultz, Tanja ;

Waibel, Alex .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :2023-2032

← 1 2 3 4 →