Reverberant Speech Recognition Based on Denoising Autoencoder

被引：0

作者：

Ishii, Takaaki ^{[1
]}

Komiyama, Hiroki ^{[1
]}

Shinozaki, Takahiro ^{[2
]}

Horiuchi, Yasuo ^{[1
]}

Kuroiwa, Shingo ^{[1
]}

机构：

[1] Chiba Univ, Grad Sch Adv Integrat Sci, Div Informat Sci, Chiba, Japan

[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Dept Informat Proc, Tokyo, Japan

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

Denoising autoencoder; reverberant speech recognition; restricted Boltzmann machine; distant-talking speech recognition; CENSREC-4; REPRESENTATIONS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Denoising autoencoder is applied to reverberant speech recognition as a noise robust front-end to reconstruct clean speech spectrum from noisy input. In order to capture context effects of speech sounds, a window of multiple short-windowed spectral frames are concatenated to form a single input vector. Additionally, a combination of short and long-term spectra is investigated to properly handle long impulse response of reverberation while keeping necessary time resolution for speech recognition. Experiments are performed using the CENSREC-4 dataset that is designed as an evaluation framework for distant-talking speech recognition. Experimental results show that the proposed denoising autoencoder based front-end using the short-windowed spectra gives better results than conventional methods. By combining the long-term spectra, further improvement is obtained. The recognition accuracy by the proposed method using the short and long-term spectra is 97.0% for the open condition test set of the dataset, whereas it is 87.8% when a multi condition training based baseline is used. As a supplemental experiment, large vocabulary speech recognition is also performed and the effectiveness of the proposed method has been confirmed.

引用

页码：3479 / 3483

页数：5

共 50 条

[21] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Xiong Xiao
Eng Siong Chng
Haizhou Li
Journal of Signal Processing Systems, 2016, 82 : 151 - 161
[22] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
[23] A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement
Du, Zhihao
Zhang, Xueliang
Han, Jiqing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1493 - 1505
[24] MODELING GENDER INFORMATION FOR EMOTION RECOGNITION USING DENOISING AUTOENCODER
Xia, Rui
Deng, Jun
Schuller, Bjoern
Liu, Yang
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[25] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
[26] An Efficient HMM-Based Feature Enhancement Method With Filter Estimation for Reverberant Speech Recognition
Cho, Ji-Won
Park, Hyung-Min
IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (12) : 1199 - 1202
[27] COMBINATION STRATEGY BASED ON RELATIVE PERFORMANCE MONITORING FOR MULTI-STREAM REVERBERANT SPEECH RECOGNITION
Xiong, Feifei
Goetze, Stefan
Meyer, Bernd T.
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4870 - 4874
[28] Denoising Method of Retinal OCT Images Based on Modularized Denoising Autoencoder
Dai Hao
Yang Yaliang
Yue Xian
Chen Shen
ACTA OPTICA SINICA, 2023, 43 (01)
[29] A DENOISING AUTOENCODER FOR SPEAKER RECOGNITION. RESULTS ON THE MCE 2018 CHALLENGE
Font, Roberto
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6016 - 6020
[30] MOOD DETECTION FROM DAILY CONVERSATIONAL SPEECH USING DENOISING AUTOENCODER AND LSTM
Huang, Kun-Yi
Wu, Chung-Hsien
Su, Ming-Hsiang
Fu, Hsiang-Chi
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5125 - 5129

← 1 2 3 4 5 →