MULTI SCALE FEEDBACK CONNECTION FOR NOISE ROBUST ACOUSTIC MODELING

被引:0
作者
Tran, Dung T. [1 ]
Iso, Ken-ichi [1 ]
Omachi, Motoi [1 ]
Fujita, Yuya [1 ]
机构
[1] Yahoo Japan Corp, Tokyo, Japan
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
noise robust ASR; feedback connection; robust feature extraction; acoustic modeling; CONVOLUTIONAL NEURAL-NETWORKS; SPEECH;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Simply feeding of a last hidden layer of the deep neural network (DNN) back to the input layer recently found to be effective for noise robust acoustic modeling. Such high level feature strengthens the robustness of DNN based acoustic model while paying approximately twice the computational cost. In this paper, we proposed to feed such high level feature iteratively back to lower layers, which is referred as multi-scale feedback connection. With this intention, we firstly extract the high level feature at the last hidden layer of DNN. Second, this high level feature feed back to a lower scale features, they then generates a subsequent prediction as well as a subsequent high level feature. This subsequent high level feature is further feed down to a lower layers. We evaluated the proposed approach on both TIMIT and a large scale internal dataset. The large scale internal dataset includes voice search and far field dataset. Our finding is two aspects. First, at equivalent computational costs, the multiscale feedback connection outperforms the DNN, the DNN with skip connection and the DNN with feedback connection. The improvement is larger on the far field dataset. Second, pair layers-wise pretraining helps the proposed approach to converge better.
引用
收藏
页码:4834 / 4838
页数:5
相关论文
共 28 条
  • [1] Convolutional Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Mohamed, Abdel-Rahman
    Jiang, Hui
    Deng, Li
    Penn, Gerald
    Yu, Dong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
  • [2] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS
    ALLEN, JB
    BERKLEY, DA
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) : 943 - 950
  • [3] [Anonymous], ARXIV150601497
  • [4] Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837
  • [5] Improving mask learning based speech enhancement system with restoration layers and residual connection
    Chen, Zhuo
    Huang, Yan
    Li, Jinyu
    Gong, Yifan
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3632 - 3636
  • [6] Dat T. H., 2016, 4 INT WORKSH SPEECH
  • [7] Heymann J., 2016, 4 INT WORKSH SPEECH
  • [8] Iso K. I., 2012, INTERSPEECH 2012 13, P2109
  • [9] Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
    Li, Bo
    Sainath, Tara N.
    Weiss, Ron J.
    Wilson, Kevin W.
    Bacchiani, Michiel
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1976 - 1980
  • [10] Li JY, 2017, INT CONF ACOUST SPEE, P4865, DOI 10.1109/ICASSP.2017.7953081