Environment-dependent denoising autoencoder for distant-talking speech recognition

被引：13

作者：

Ueda, Yuma ^{[1
]}

Wang, Longbiao ^{[2
]}

Kai, Atsuhiko ^{[1
]}

Ren, Bo ^{[2
]}

机构：

[1] Shizuoka Univ, Grad Sch Engn, Naka Ku, Hamamatsu, Shizuoka 4328561, Japan

[2] Nagaoka Univ Technol, 1603-1 Kamitomioka, Nagaoka, Niigata 9402188, Japan

来源：

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING | 2015年

关键词：

Speech recognition; Dereverberation; Denoising autoencoder; Environment identification; Distant-talking speech; SPECTRAL SUBTRACTION; DEREVERBERATION; MODEL; REVERBERATION; ENHANCEMENT; DOMAIN; NOISE;

D O I：

10.1186/s13634-015-0278-y

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we propose an environment-dependent denoising autoencoder (DAE) and automatic environment identification based on a deep neural network (DNN) with blind reverberation estimation for robust distant-talking speech recognition. Recently, DAEs have been shown to be effective in many noise reduction and reverberation suppression applications because higher-level representations and increased flexibility of the feature mapping function can be learned. However, a DAE is not adequate in mismatched training and test environments. In a conventional DAE, parameters are trained using pairs of reverberant speech and clean speech under various acoustic conditions (that is, an environment-independent DAE). To address the above problem, we propose two environment-dependent DAEs to reduce the influence of mismatches between training and test environments. In the first approach, we train various DAEs using speech from different acoustic environments, and the DAE for the condition that best matches the test condition is automatically selected (that is, a two-step environment-dependent DAE). To improve environment identification performance, we propose a DNN that uses both reverberant speech and estimated reverberation. In the second approach, we add estimated reverberation features to the input of the DAE (that is, a one-step environment-dependent DAE or a reverberation-aware DAE). The proposed method is evaluated using speech in simulated and real reverberant environments. Experimental results show that the environment-dependent DAE outperforms the environment-independent one in both simulated and real reverberant environments. For two-step environment-dependent DAE, the performance of environment identification based on the proposed DNN approach is also better than that of the conventional DNN approach, in which only reverberant speech is used and reverberation is not blindly estimated. And, the one-step environment-dependent DAE significantly outperforms the two-step environment-dependent DAE.

引用

页数：11

共 32 条

[21] Music Removal by Convolutional Denoising Autoencoder in Speech Recognition
Zhao, Mengyuan
Wang, Dong
Zhang, Zhiyong
Zhang, Xuewei
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 338 - 341
[22] Simultaneous recognition of distant-talking speech of multiple talkers based on the 3-D N-best search method
Heracleous, P
Nakamura, S
Shikano, K
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 105 - 116
[23] Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method
Panikos Heracleous
Satoshi Nakamura
Kiyohiro Shikano
Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 105 - 116
[24] Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction
Shiota, Satoshi
Wang, Longbiao
Odani, Kyohei
Kai, Atsuhiko
Li, Weifeng
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 384 - +
[25] Reverberant Speech Recognition Based on Denoising Autoencoder
Ishii, Takaaki
Komiyama, Hiroki
Shinozaki, Takahiro
Horiuchi, Yasuo
Kuroiwa, Shingo
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3479 - 3483
[26] On the use of empirically determined impulse responses for improving distant talking speech recognition
Ploetz, Thomas
Fink, Gernot A.
2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 157 - 160
[27] Environment-dependent Attention-driven Recurrent Convolutional Neural Network for Robust Speech Enhancement
Ge, Meng
Wang, Longbiao
Li, Nan
Shi, Hao
Dang, Jianwu
Li, Xiangang
INTERSPEECH 2019, 2019, : 3153 - 3157
[28] INTEGRATING DENOISING AUTOENCODER AND VECTOR TAYLOR SERIES WITH AUDITORY MASKING FOR SPEECH RECOGNITION IN NOISY CONDITIONS
Das, A. Biswajit
Panda, Ashish
2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2305 - 2309
[29] Speech Recognition by Denoising and Dereverberation Based on Spectral Subtraction in a Real Noisy Reverberant Environment
Odani, Kyohei
Wang, Longbiao
Kai, Atsuhiko
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1250 - 1253
[30] Joint Optimization of Denoising Autoencoder and DNN Acoustic Model Based on Multi-target Learning for Noisy Speech Recognition
Mimura, Masato
Sakai, Shinsuke
Kawahara, Tatsuya
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3803 - 3807

← 1 2 3 4 →