FAR-FIELD SPEECH RECOGNITION USING CNN-DNN-HMM WITH CONVOLUTION IN TIME

被引：0

作者：

Yoshioka, Takuya ^{[1
]}

Karita, Shigeki ^{[1
,2
]}

Nakatani, Tomohiro ^{[1
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan

[2] Osaka Univ, Grad Sch Engn, Osaka, Japan

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

Far-field speech recognition; reverberation; convolutional neural network; deep neural network; NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recent studies in speech recognition have shown that the performance of convolutional neural networks (CNNs) is superior to that of fully connected deep neural networks (DNNs). In this paper, we explore the use of CNNs in far-field speech recognition for dealing with reverberation, which blurs spectral energies along the time axis. Unlike most previous CNN applications to speech recognition, we consider convolution in time to examine whether it provides an improved reverberation modelling capability. Experimental results show that a CNN coupled with a fully connected DNN can model short time correlations in feature vectors with fewer parameters than a DNN and thus generalise better to unseen test environments. Combining this approach with signal-space dereverberation, which copes with long-term correlations, is shown to result in further improvement, where the gains from both approaches are almost additive. An initial investigation of the use of restricted convolution forms is also undertaken.

引用

页码：4360 / 4364

页数：5

共 29 条

[1] Abdel-Hamid O, 2013, INTERSPEECH, P3365
[2] Convolutional Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Mohamed, Abdel-Rahman
Jiang, Hui
Deng, Li
Penn, Gerald
Yu, Dong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
[3] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
Dahl, George E.
Yu, Dong
Deng, Li
Acero, Alex
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
[4] Delcroix M., 2014, Proceedings of REVERB Challenge Workshop
[5] Gales M. J. F., 2011, 2011 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA 2011), P121, DOI 10.1109/HSCMA.2011.5942377
[6] Habets E. A. P., 2006, THESIS
[7] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[8] Kinoshita T, 2013, 2013 9TH INTERNATIONAL WORKSHOP ON ELECTROMAGNETIC COMPATIBILITY OF INTEGRATED CIRCUITS (EMC COMPO 2013), P1, DOI 10.1109/EMCCompo.2013.6735162
[9] Knill KM, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P138, DOI 10.1109/ASRU.2013.6707719
[10] Model-Based Feature Enhancement for Reverberant Speech Recognition
Krueger, Alexander
Haeb-Umbach, Reinhold
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1692 - 1707

← 1 2 3 →