FAR-FIELD SPEECH RECOGNITION USING CNN-DNN-HMM WITH CONVOLUTION IN TIME

被引：0

作者：

Yoshioka, Takuya ^{[1
]}

Karita, Shigeki ^{[1
,2
]}

Nakatani, Tomohiro ^{[1
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan

[2] Osaka Univ, Grad Sch Engn, Osaka, Japan

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

Far-field speech recognition; reverberation; convolutional neural network; deep neural network; NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recent studies in speech recognition have shown that the performance of convolutional neural networks (CNNs) is superior to that of fully connected deep neural networks (DNNs). In this paper, we explore the use of CNNs in far-field speech recognition for dealing with reverberation, which blurs spectral energies along the time axis. Unlike most previous CNN applications to speech recognition, we consider convolution in time to examine whether it provides an improved reverberation modelling capability. Experimental results show that a CNN coupled with a fully connected DNN can model short time correlations in feature vectors with fewer parameters than a DNN and thus generalise better to unseen test environments. Combining this approach with signal-space dereverberation, which copes with long-term correlations, is shown to result in further improvement, where the gains from both approaches are almost additive. An initial investigation of the use of restricted convolution forms is also undertaken.

引用

页码：4360 / 4364

页数：5

共 29 条

[21] Toth Laszlo, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P190, DOI 10.1109/ICASSP.2014.6853584
[22] Wang Y.-Q, 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), P113, DOI 10.1109/ASRU.2011.6163915
[23] Weninger Felix, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4623, DOI 10.1109/ICASSP.2014.6854478
[24] Weninger F., 2014, P IEEE REVERB WORKSH
[25] Xiao X., 2014, P REVERB CHALL WORKS
[26] Environmentally robust ASR front-end for deep neural network acoustic models
Yoshioka, T.
Gales, M. J. F.
[J]. COMPUTER SPEECH AND LANGUAGE, 2015, 31 (01) : 65 - 86
[27] Yoshioka Takuya, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5527, DOI 10.1109/ICASSP.2014.6854660
[28] Making Machines Understand Us in Reverberant Rooms
Yoshioka, Takuya
Sehr, Armin
Delcroix, Marc
Kinoshita, Keisuke
Maas, Roland
Nakatani, Tomohiro
Kellermann, Walter
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 114 - 126
[29] Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening
Yoshioka, Takuya
Nakatani, Tomohiro
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (10): : 2707 - 2720

← 1 2 3 →