FAR-FIELD SPEECH RECOGNITION USING CNN-DNN-HMM WITH CONVOLUTION IN TIME

被引:0
作者
Yoshioka, Takuya [1 ]
Karita, Shigeki [1 ,2 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Osaka Univ, Grad Sch Engn, Osaka, Japan
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
关键词
Far-field speech recognition; reverberation; convolutional neural network; deep neural network; NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent studies in speech recognition have shown that the performance of convolutional neural networks (CNNs) is superior to that of fully connected deep neural networks (DNNs). In this paper, we explore the use of CNNs in far-field speech recognition for dealing with reverberation, which blurs spectral energies along the time axis. Unlike most previous CNN applications to speech recognition, we consider convolution in time to examine whether it provides an improved reverberation modelling capability. Experimental results show that a CNN coupled with a fully connected DNN can model short time correlations in feature vectors with fewer parameters than a DNN and thus generalise better to unseen test environments. Combining this approach with signal-space dereverberation, which copes with long-term correlations, is shown to result in further improvement, where the gains from both approaches are almost additive. An initial investigation of the use of restricted convolution forms is also undertaken.
引用
收藏
页码:4360 / 4364
页数:5
相关论文
共 29 条
  • [21] Toth Laszlo, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P190, DOI 10.1109/ICASSP.2014.6853584
  • [22] Wang Y.-Q, 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), P113, DOI 10.1109/ASRU.2011.6163915
  • [23] Weninger Felix, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4623, DOI 10.1109/ICASSP.2014.6854478
  • [24] Weninger F., 2014, P IEEE REVERB WORKSH
  • [25] Xiao X., 2014, P REVERB CHALL WORKS
  • [26] Environmentally robust ASR front-end for deep neural network acoustic models
    Yoshioka, T.
    Gales, M. J. F.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2015, 31 (01) : 65 - 86
  • [27] Yoshioka Takuya, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5527, DOI 10.1109/ICASSP.2014.6854660
  • [28] Making Machines Understand Us in Reverberant Rooms
    Yoshioka, Takuya
    Sehr, Armin
    Delcroix, Marc
    Kinoshita, Keisuke
    Maas, Roland
    Nakatani, Tomohiro
    Kellermann, Walter
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 114 - 126
  • [29] Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening
    Yoshioka, Takuya
    Nakatani, Tomohiro
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (10): : 2707 - 2720