Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization

被引:0
作者
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Xiong Xiao
Eng Siong Chng
Haizhou Li
机构
[1] Shizuoka University,Graduate School of Engineering
[2] Nagaoka University of Technology,Temasek Laboratories @ NTU
[3] Nanyang Technological University,School of Computer Engineering
[4] Nanyang Technological University,Human Language Technology
[5] Institute for Infocomm Research,undefined
[6] A*STAR,undefined
来源
Journal of Signal Processing Systems | 2016年 / 82卷
关键词
Speech recognition; Dereverberation; Denoising autoencoder; Environment adaptation; Distant-talking speech;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose a robust distant-talking speech recognition by combining cepstral domain denoising autoencoder (DAE) and temporal structure normalization (TSN) filter. As DAE has a deep structure and nonlinear processing steps, it is flexible enough to model highly nonlinear mapping between input and output space. In this paper, we train a DAE to map reverberant and noisy speech features to the underlying clean speech features in the cepstral domain. For the proposed method, after applying a DAE in the cepstral domain of speech to suppress reverberation, we apply a post-processing technology based on temporal structure normalization (TSN) filter to reduce the noise and reverberation effects by normalizing the modulation spectra to reference spectra of clean speech. The proposed method was evaluated using speech in simulated and real reverberant environments. By combining a cepstral-domain DAE and TSN, the average Word Error Rate (WER) was reduced from 25.2 % of the baseline system to 21.2 % in simulated environments and from 47.5 % to 41.3 % in real environments, respectively.
引用
收藏
页码:151 / 161
页数:10
相关论文
共 61 条
  • [1] Yoshioka T(2012)Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition IEEE Signal Processing Magazine 29 114-126
  • [2] Sehr A(2006)A two-stage algorithm for one-microphone reverberant speech enhancement IEEE Transactions on ASLP 14 774-784
  • [3] Delcroix M(2007)Far-field speaker recognition IEEE Transactions on ASLP 15 2023-2032
  • [4] Kinoshita K(2007)M.Miyoshi, Precise dereverberation using multi-channel linear prediction IEEE Transactions on ASLP 15 430-440
  • [5] Maas R(2013)Hands-free speaker identification based on spectral subtraction using a multi-channel least mean square approach Proceedings of ICASSP 2013 7224-7228
  • [6] Nakatani T(2006)Robust Distant Speech Recognition by Combining Multiple Microphone-array Processing with Position-dependent CMN Eurasip Journal on Applied Signal Processing 2006 1-11
  • [7] Kellermann W(2011)Distant-talking speech recognition based on spectral subtraction by multi-channel LMS algorithm IEICE Transactions on Information Systems E94-D 659-667
  • [8] Wu M(2012)Dereverberation and denoising based on generalized spectral subtraction by nutil-channel LMS algorithm using a small-scale microphone array Eurasip Journal on Advances in Signal Processing 2012 1-11
  • [9] Wang D(2008)A new approach for the adaptation of HMMs to reverberation and background noise Speech Communication 50 244-263
  • [10] Jin Q(2010)Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition IEEE Transactions on ASLP 18 1676-1691