Improving Interpretability and Regularization in Deep Learning

被引：31

作者：

Wu, Chunyang ^{[1
]}

Gales, Mark J. F. ^{[1
]}

Ragni, Anton ^{[1
]}

Karanasou, Penny ^{[1
]}

Sim, Khe Chai ^{[2
]}

机构：

[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England

[2] Google Inc, Mountain View, CA 94043 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2018年 / 26卷 / 02期

关键词：

Activation regularisation; interpretability; visualisation; neural network; deep learning; NEURAL-NETWORKS; ADAPTATION;

D O I：

10.1109/TASLP.2017.2774919

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep learning approaches yield state-of-the-art performance in a range of tasks, including automatic speech recognition. However, the highly distributed representation in a deep neural network (DNN) or other network variations is difficult to analyze, making further parameter interpretation and regularization challenging. This paper presents a regularization scheme acting on the activation function output to improve the network interpretability and regularization. The proposed approach, referred to as activation regularization, encourages activation function outputs to satisfy a target pattern. By defining appropriate target patterns, different learning concepts can be imposed on the network. This method can aid network interpretability and also has the potential to reduce overfitting. The scheme is evaluated on several continuous speech recognition tasks: the Wall Street Journal continuous speech recognition task, eight conversational telephone speech tasks from the IARPA Babel program and a U.S. English broadcast news task. On all the tasks, the activation regularization achieved consistent performance gains over the standard DNN baselines.

引用

页码：256 / 265

页数：10

共 43 条

[1]

Nguyen A, 2015, PROC CVPR IEEE, P427, DOI 10.1109/CVPR.2015.7298640

[2]

Bell P, 2015, INT CONF ACOUST SPEE, P4290, DOI 10.1109/ICASSP.2015.7178780

[3]

Bishop, 1994, MIXTURE DENSITY NETW, DOI DOI 10.1007/978-3-322-81570-58

[4]

Chen X, 2016, INT CONF ACOUST SPEE, P6000, DOI 10.1109/ICASSP.2016.7472829

[5]

Collobert R., 2008, P 25 ICML, P160, DOI [DOI 10.1145/1390156.1390177, 10.1145/1390156.1390177]

[6] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].

Dahl, George E. ;

Yu, Dong ;

Deng, Li ;

Acero, Alex .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42

[7]

Dai L., 2016, J MACH LEARN RES, V17, P1

[8]

Fiscus J. G., 2007, P ACM SIGIR WORKSH S, P51

[9]

Gales MJF, 2015, INT CONF ACOUST SPEE, P5186, DOI 10.1109/ICASSP.2015.7178960

[10]

Garofalo J., 2007, LINGUISTIC DATA CONS

← 1 2 3 4 5 →