Reverberant Speech Recognition Based on Denoising Autoencoder

被引:0
作者
Ishii, Takaaki [1 ]
Komiyama, Hiroki [1 ]
Shinozaki, Takahiro [2 ]
Horiuchi, Yasuo [1 ]
Kuroiwa, Shingo [1 ]
机构
[1] Chiba Univ, Grad Sch Adv Integrat Sci, Div Informat Sci, Chiba, Japan
[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Dept Informat Proc, Tokyo, Japan
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
Denoising autoencoder; reverberant speech recognition; restricted Boltzmann machine; distant-talking speech recognition; CENSREC-4; REPRESENTATIONS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Denoising autoencoder is applied to reverberant speech recognition as a noise robust front-end to reconstruct clean speech spectrum from noisy input. In order to capture context effects of speech sounds, a window of multiple short-windowed spectral frames are concatenated to form a single input vector. Additionally, a combination of short and long-term spectra is investigated to properly handle long impulse response of reverberation while keeping necessary time resolution for speech recognition. Experiments are performed using the CENSREC-4 dataset that is designed as an evaluation framework for distant-talking speech recognition. Experimental results show that the proposed denoising autoencoder based front-end using the short-windowed spectra gives better results than conventional methods. By combining the long-term spectra, further improvement is obtained. The recognition accuracy by the proposed method using the short and long-term spectra is 97.0% for the open condition test set of the dataset, whereas it is 87.8% when a multi condition training based baseline is used. As a supplemental experiment, large vocabulary speech recognition is also performed and the effectiveness of the proposed method has been confirmed.
引用
收藏
页码:3479 / 3483
页数:5
相关论文
共 50 条
  • [21] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Yuma Ueda
    Longbiao Wang
    Atsuhiko Kai
    Xiong Xiao
    Eng Siong Chng
    Haizhou Li
    Journal of Signal Processing Systems, 2016, 82 : 151 - 161
  • [22] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
  • [23] A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement
    Du, Zhihao
    Zhang, Xueliang
    Han, Jiqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1493 - 1505
  • [24] MODELING GENDER INFORMATION FOR EMOTION RECOGNITION USING DENOISING AUTOENCODER
    Xia, Rui
    Deng, Jun
    Schuller, Bjoern
    Liu, Yang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [25] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
  • [26] An Efficient HMM-Based Feature Enhancement Method With Filter Estimation for Reverberant Speech Recognition
    Cho, Ji-Won
    Park, Hyung-Min
    IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (12) : 1199 - 1202
  • [27] COMBINATION STRATEGY BASED ON RELATIVE PERFORMANCE MONITORING FOR MULTI-STREAM REVERBERANT SPEECH RECOGNITION
    Xiong, Feifei
    Goetze, Stefan
    Meyer, Bernd T.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4870 - 4874
  • [28] Denoising Method of Retinal OCT Images Based on Modularized Denoising Autoencoder
    Dai Hao
    Yang Yaliang
    Yue Xian
    Chen Shen
    ACTA OPTICA SINICA, 2023, 43 (01)
  • [29] A DENOISING AUTOENCODER FOR SPEAKER RECOGNITION. RESULTS ON THE MCE 2018 CHALLENGE
    Font, Roberto
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6016 - 6020
  • [30] MOOD DETECTION FROM DAILY CONVERSATIONAL SPEECH USING DENOISING AUTOENCODER AND LSTM
    Huang, Kun-Yi
    Wu, Chung-Hsien
    Su, Ming-Hsiang
    Fu, Hsiang-Chi
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5125 - 5129