Reverberant Speech Recognition Based on Denoising Autoencoder

被引：0

作者：

Ishii, Takaaki ^{[1
]}

Komiyama, Hiroki ^{[1
]}

Shinozaki, Takahiro ^{[2
]}

Horiuchi, Yasuo ^{[1
]}

Kuroiwa, Shingo ^{[1
]}

机构：

[1] Chiba Univ, Grad Sch Adv Integrat Sci, Div Informat Sci, Chiba, Japan

[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Dept Informat Proc, Tokyo, Japan

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

Denoising autoencoder; reverberant speech recognition; restricted Boltzmann machine; distant-talking speech recognition; CENSREC-4; REPRESENTATIONS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Denoising autoencoder is applied to reverberant speech recognition as a noise robust front-end to reconstruct clean speech spectrum from noisy input. In order to capture context effects of speech sounds, a window of multiple short-windowed spectral frames are concatenated to form a single input vector. Additionally, a combination of short and long-term spectra is investigated to properly handle long impulse response of reverberation while keeping necessary time resolution for speech recognition. Experiments are performed using the CENSREC-4 dataset that is designed as an evaluation framework for distant-talking speech recognition. Experimental results show that the proposed denoising autoencoder based front-end using the short-windowed spectra gives better results than conventional methods. By combining the long-term spectra, further improvement is obtained. The recognition accuracy by the proposed method using the short and long-term spectra is 97.0% for the open condition test set of the dataset, whereas it is 87.8% when a multi condition training based baseline is used. As a supplemental experiment, large vocabulary speech recognition is also performed and the effectiveness of the proposed method has been confirmed.

引用

页码：3479 / 3483

页数：5

共 50 条

[1] Music Removal by Convolutional Denoising Autoencoder in Speech Recognition
Zhao, Mengyuan
Wang, Dong
Zhang, Zhiyong
Zhang, Xuewei
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 338 - 341
[2] SPEECH FEATURE DENOISING AND DEREVERBERATION VIA DEEP AUTOENCODERS FOR NOISY REVERBERANT SPEECH RECOGNITION
Feng, Xue
Zhang, Yaodong
Glass, James
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[3] Environment-dependent denoising autoencoder for distant-talking speech recognition
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Bo Ren
EURASIP Journal on Advances in Signal Processing, 2015
[4] Environment-dependent denoising autoencoder for distant-talking speech recognition
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Ren, Bo
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
[5] Model-Based Feature Enhancement for Reverberant Speech Recognition
Krueger, Alexander
Haeb-Umbach, Reinhold
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1692 - 1707
[6] INTEGRATING DENOISING AUTOENCODER AND VECTOR TAYLOR SERIES WITH AUDITORY MASKING FOR SPEECH RECOGNITION IN NOISY CONDITIONS
Das, A. Biswajit
Panda, Ashish
2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2305 - 2309
[7] Performance Estimation of Reverberant Speech Recognition Based on Reverberant Criteria RSR-Dn with Acoustic Parameters
Fukurnori, Takahiro
Morise, Masanori
Nishiura, Takanobu
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 562 - +
[8] Strategies for distant speech recognition in reverberant environments
Delcroix, Marc
Yoshioka, Takuya
Ogawa, Atsunori
Kubo, Yotaro
Fujimoto, Masakiyo
Ito, Nobutaka
Kinoshita, Keisuke
Espi, Miquel
Araki, Shoko
Hori, Takaaki
Nakatani, Tomohiro
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
[9] Ensemble Modeling of Denoising Autoencoder for Speech Spectrum Restoration
Lu, Xugang
Tsao, Yu
Matsuda, Shigeki
Hori, Chiori
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 885 - 889
[10] Improved Automatic Speech Recognition using Subband Temporal Envelope Features and Time-delay Neural Network Denoising Autoencoder
Cong-Thanh Do
Stylianou, Yannis
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3832 - 3836

← 1 2 3 4 5 →