Environment-dependent denoising autoencoder for distant-talking speech recognition

被引:0
作者
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Bo Ren
机构
[1] Shizuoka University,Graduate School of Engineering
[2] Nagaoka University of Technology,undefined
来源
EURASIP Journal on Advances in Signal Processing | / 2015卷
关键词
Speech recognition; Dereverberation; Denoising autoencoder; Environment identification; Distant-talking speech;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose an environment-dependent denoising autoencoder (DAE) and automatic environment identification based on a deep neural network (DNN) with blind reverberation estimation for robust distant-talking speech recognition. Recently, DAEs have been shown to be effective in many noise reduction and reverberation suppression applications because higher-level representations and increased flexibility of the feature mapping function can be learned. However, a DAE is not adequate in mismatched training and test environments. In a conventional DAE, parameters are trained using pairs of reverberant speech and clean speech under various acoustic conditions (that is, an environment-independent DAE). To address the above problem, we propose two environment-dependent DAEs to reduce the influence of mismatches between training and test environments. In the first approach, we train various DAEs using speech from different acoustic environments, and the DAE for the condition that best matches the test condition is automatically selected (that is, a two-step environment-dependent DAE). To improve environment identification performance, we propose a DNN that uses both reverberant speech and estimated reverberation. In the second approach, we add estimated reverberation features to the input of the DAE (that is, a one-step environment-dependent DAE or a reverberation-aware DAE). The proposed method is evaluated using speech in simulated and real reverberant environments. Experimental results show that the environment-dependent DAE outperforms the environment-independent one in both simulated and real reverberant environments. For two-step environment-dependent DAE, the performance of environment identification based on the proposed DNN approach is also better than that of the conventional DNN approach, in which only reverberant speech is used and reverberation is not blindly estimated. And, the one-step environment-dependent DAE significantly outperforms the two-step environment-dependent DAE.
引用
收藏
相关论文
共 52 条
  • [1] Yoshioka T(2012)Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition IEEE Signal Process. Mag 29 114-126
  • [2] Sehr A(2006)A two-stage algorithm for one-microphone reverberant speech enhancement IEEE Trans. ASLP 14 774-784
  • [3] Delcroix M(2007)Far-field speaker recognition IEEE Trans. ASLP 15 2023-2032
  • [4] Kinoshita K(2007)Precise dereverberation using multi-channel linear prediction IEEE Trans. ASLP 15 430-440
  • [5] Maas R(2011)Distant-talking speech recognition based on spectral subtraction by multi-channel LMS algorithm.IEICE Trans Inf. Syst. E94-D 659-667
  • [6] Nakatani T(2012)Dereverberation and denoising based on generalized spectral subtraction by multi-channel LMS algorithm using a small-scale microphone array Eurasip J. Adv. Signal Process 2012 1-11
  • [7] Kellermann W(2008)Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition Speech Comm. 50 244-263
  • [8] Wu M(2010)Distant-talking speech recognition based on spectral subtraction by multi-channel LMS algorithm IEEE Trans. ASLP 18 1676-1691
  • [9] Wang D(2011)Cepstral Analysis Technique for automatic speaker verification IEICE Trans. Inf. Syst. E94-D 659-667
  • [10] Jin Q(1981)Suppression of acoustic noise in speech using spectral subtraction IEEE Trans. Acoust. Speech Signal Process 29 254-272