MULTI SCALE FEEDBACK CONNECTION FOR NOISE ROBUST ACOUSTIC MODELING

被引：0

作者：

Tran, Dung T. ^{[1
]}

Iso, Ken-ichi ^{[1
]}

Omachi, Motoi ^{[1
]}

Fujita, Yuya ^{[1
]}

机构：

[1] Yahoo Japan Corp, Tokyo, Japan

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

noise robust ASR; feedback connection; robust feature extraction; acoustic modeling; CONVOLUTIONAL NEURAL-NETWORKS; SPEECH;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Simply feeding of a last hidden layer of the deep neural network (DNN) back to the input layer recently found to be effective for noise robust acoustic modeling. Such high level feature strengthens the robustness of DNN based acoustic model while paying approximately twice the computational cost. In this paper, we proposed to feed such high level feature iteratively back to lower layers, which is referred as multi-scale feedback connection. With this intention, we firstly extract the high level feature at the last hidden layer of DNN. Second, this high level feature feed back to a lower scale features, they then generates a subsequent prediction as well as a subsequent high level feature. This subsequent high level feature is further feed down to a lower layers. We evaluated the proposed approach on both TIMIT and a large scale internal dataset. The large scale internal dataset includes voice search and far field dataset. Our finding is two aspects. First, at equivalent computational costs, the multiscale feedback connection outperforms the DNN, the DNN with skip connection and the DNN with feedback connection. The improvement is larger on the far field dataset. Second, pair layers-wise pretraining helps the proposed approach to converge better.

引用

页码：4834 / 4838

页数：5

共 28 条

[1] Convolutional Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Mohamed, Abdel-Rahman
Jiang, Hui
Deng, Li
Penn, Gerald
Yu, Dong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
[2] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS
ALLEN, JB
BERKLEY, DA
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) : 943 - 950
[3] [Anonymous], ARXIV150601497
[4] Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837
[5] Improving mask learning based speech enhancement system with restoration layers and residual connection
Chen, Zhuo
Huang, Yan
Li, Jinyu
Gong, Yifan
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3632 - 3636
[6] Dat T. H., 2016, 4 INT WORKSH SPEECH
[7] Heymann J., 2016, 4 INT WORKSH SPEECH
[8] Iso K. I., 2012, INTERSPEECH 2012 13, P2109
[9] Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
Li, Bo
Sainath, Tara N.
Weiss, Ron J.
Wilson, Kevin W.
Bacchiani, Michiel
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1976 - 1980
[10] Li JY, 2017, INT CONF ACOUST SPEE, P4865, DOI 10.1109/ICASSP.2017.7953081

← 1 2 3 →