Reverberant Speech Recognition Based on Denoising Autoencoder

被引:0
作者
Ishii, Takaaki [1 ]
Komiyama, Hiroki [1 ]
Shinozaki, Takahiro [2 ]
Horiuchi, Yasuo [1 ]
Kuroiwa, Shingo [1 ]
机构
[1] Chiba Univ, Grad Sch Adv Integrat Sci, Div Informat Sci, Chiba, Japan
[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Dept Informat Proc, Tokyo, Japan
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
Denoising autoencoder; reverberant speech recognition; restricted Boltzmann machine; distant-talking speech recognition; CENSREC-4; REPRESENTATIONS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Denoising autoencoder is applied to reverberant speech recognition as a noise robust front-end to reconstruct clean speech spectrum from noisy input. In order to capture context effects of speech sounds, a window of multiple short-windowed spectral frames are concatenated to form a single input vector. Additionally, a combination of short and long-term spectra is investigated to properly handle long impulse response of reverberation while keeping necessary time resolution for speech recognition. Experiments are performed using the CENSREC-4 dataset that is designed as an evaluation framework for distant-talking speech recognition. Experimental results show that the proposed denoising autoencoder based front-end using the short-windowed spectra gives better results than conventional methods. By combining the long-term spectra, further improvement is obtained. The recognition accuracy by the proposed method using the short and long-term spectra is 97.0% for the open condition test set of the dataset, whereas it is 87.8% when a multi condition training based baseline is used. As a supplemental experiment, large vocabulary speech recognition is also performed and the effectiveness of the proposed method has been confirmed.
引用
收藏
页码:3479 / 3483
页数:5
相关论文
共 50 条
  • [1] Music Removal by Convolutional Denoising Autoencoder in Speech Recognition
    Zhao, Mengyuan
    Wang, Dong
    Zhang, Zhiyong
    Zhang, Xuewei
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 338 - 341
  • [2] SPEECH FEATURE DENOISING AND DEREVERBERATION VIA DEEP AUTOENCODERS FOR NOISY REVERBERANT SPEECH RECOGNITION
    Feng, Xue
    Zhang, Yaodong
    Glass, James
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Environment-dependent denoising autoencoder for distant-talking speech recognition
    Yuma Ueda
    Longbiao Wang
    Atsuhiko Kai
    Bo Ren
    EURASIP Journal on Advances in Signal Processing, 2015
  • [4] Environment-dependent denoising autoencoder for distant-talking speech recognition
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Ren, Bo
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [5] Model-Based Feature Enhancement for Reverberant Speech Recognition
    Krueger, Alexander
    Haeb-Umbach, Reinhold
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1692 - 1707
  • [6] INTEGRATING DENOISING AUTOENCODER AND VECTOR TAYLOR SERIES WITH AUDITORY MASKING FOR SPEECH RECOGNITION IN NOISY CONDITIONS
    Das, A. Biswajit
    Panda, Ashish
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2305 - 2309
  • [7] Performance Estimation of Reverberant Speech Recognition Based on Reverberant Criteria RSR-Dn with Acoustic Parameters
    Fukurnori, Takahiro
    Morise, Masanori
    Nishiura, Takanobu
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 562 - +
  • [8] Strategies for distant speech recognition in reverberant environments
    Delcroix, Marc
    Yoshioka, Takuya
    Ogawa, Atsunori
    Kubo, Yotaro
    Fujimoto, Masakiyo
    Ito, Nobutaka
    Kinoshita, Keisuke
    Espi, Miquel
    Araki, Shoko
    Hori, Takaaki
    Nakatani, Tomohiro
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [9] Ensemble Modeling of Denoising Autoencoder for Speech Spectrum Restoration
    Lu, Xugang
    Tsao, Yu
    Matsuda, Shigeki
    Hori, Chiori
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 885 - 889
  • [10] Improved Automatic Speech Recognition using Subband Temporal Envelope Features and Time-delay Neural Network Denoising Autoencoder
    Cong-Thanh Do
    Stylianou, Yannis
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3832 - 3836