Improving Weight Initialization of ReLU and Output Layers

被引：9

作者：

Aguirre, Diego ^{[1
]}

Fuentes, Olac ^{[1
]}

机构：

[1] Univ Texas El Paso, El Paso, TX 79968 USA

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: DEEP LEARNING, PT II | 2019年 / 11728卷

关键词：

Weight initialization;

D O I：

10.1007/978-3-030-30484-3_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a data-dependent weight initialization scheme for ReLU and output layers commonly found in modern neural network architectures. An initial feedforward pass through the network is performed using an initialization set (a subset of the training data set). Using statistics obtained from this pass, we initialize the weights of the network, so the following properties are met: (1) weight matrices are orthogonal; (2) ReLU layers produce a predetermined fraction of nonzero activations; (3) the outputs produced by internal layers have a predetermined variance; (4) weights in the last layer are chosen to minimize the squared error in the initialization set. We evaluate our method on popular architectures (VGG16, VGG19, and InceptionV3) and faster convergence rates are achieved on the ImageNet data set when compared to state-of-the-art initialization techniques (LSUV, He, and Glorot).

引用

页码：170 / 184

页数：15

共 21 条

[1]

[Anonymous], P 3 INT C LEARNING R

[2]

[Anonymous], 2013, ARXIV13126120

[3]

[Anonymous], 2016, ARXIV161101491

[4]

[Anonymous], 2017, P 31 AAAI C ART INT

[5]

[Anonymous], WEIGHT INITIALIZATIO

[6]

[Anonymous], BMVC

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8]

Glorot X., 2010, P 13 INT C ART INT S, P249

[9] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[10]

He Kaiming, 2018, ABS181108883 CORR

← 1 2 3 →