Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization

被引:0
作者
Ueda, Yuma [1 ]
Wang, Longbiao [2 ]
Kai, Atsuhiko [1 ]
Xiao, Xiong [3 ]
Chng, Eng Siong [3 ]
Li, Haizhou [4 ]
机构
[1] Shizuoka Univ, Grad Sch Engn, Hamamatsu, Shizuoka 4328561, Japan
[2] Nagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan
[3] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
[4] Inst Infocomm Res, Human Language Technol, Singapore, Singapore
来源
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年
关键词
speech recognition; dereverberation; denoising autoencoder; environment adaptation; distant-talking speech; SPECTRAL SUBTRACTION; REVERBERATION; ADAPTATION; DOMAIN; MODEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a robust distant-talking speech recognition by combining cepstral domain denoising autoencoder (DAE) and temporal structure normalization (TSN) filter. For the proposed method, after applying a DAE in the cepstral domain of speech to suppress reverberation, we apply a postprocessing technology based on temporal structure normalization (TSN) filter to reduce the noise and reverberation effects by normalizing the modulation spectra to reference spectra of clean speech. The proposed method was evaluated using speech in simulated and real reverberant environments. By combining a cepstral-domain DAE and TSN, the average Word Error Rate (WER) was reduced from 25.2% of the baseline system to 21.2% in simulated environments and from 47.5% to 41.3% in real environments, respectively.
引用
收藏
页码:379 / +
页数:3
相关论文
共 31 条
  • [21] Wang L., 2006, EURASIP J APPL SIG P, V2006, P1
  • [22] Wang LB, 2007, INT CONF ACOUST SPEE, P817
  • [23] Wang LB, 2013, INT CONF ACOUST SPEE, P7224, DOI 10.1109/ICASSP.2013.6639065
  • [24] Dereverberation and denoising based on generalized spectral subtraction by multi-channel LMS algorithm using a small-scale microphone array
    Wang, Longbiao
    Odani, Kyohei
    Kai, Atsuhiko
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
  • [25] Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
    Wang, Longbiao
    Kitaoka, Norihide
    Nakagawa, Seiichi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (03): : 659 - 667
  • [26] A two-stage algorithm for one-microphone reverberant speech enhancement
    Wu, MY
    Wang, DL
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 774 - 784
  • [27] Normalization of the Speech Modulation Spectra for Robust Speech Recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (08): : 1662 - 1674
  • [28] Temporal structure normalization of speech feature for robust speech recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (07) : 500 - 503
  • [29] Yamada T, 2013, INTERSPEECH, P3628
  • [30] Making Machines Understand Us in Reverberant Rooms
    Yoshioka, Takuya
    Sehr, Armin
    Delcroix, Marc
    Kinoshita, Keisuke
    Maas, Roland
    Nakatani, Tomohiro
    Kellermann, Walter
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 114 - 126