Dying ReLU and Initialization: Theory and Numerical Examples

被引:159
作者
Lu, Lu [1 ]
Shin, Yeonjong [2 ]
Su, Yanhui [3 ]
Karniadakis, George Em [2 ]
机构
[1] MIT, Dept Math, Cambridge, MA 02139 USA
[2] Brown Univ, Div Appl Math, Providence, RI 02912 USA
[3] Fuzhou Univ, Coll Math & Comp Sci, Fuzhou 350116, Fujian, Peoples R China
关键词
Neural network; Dying ReLU; Vanishing/Exploding gradient; Randomized asym-metric initialization; DEEP NEURAL-NETWORKS; ERROR;
D O I
10.4208/cicp.OA-2020-0165
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We show that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure.
引用
收藏
页码:1671 / 1706
页数:36
相关论文
共 57 条
[1]  
Agarap A F., 2018, DEEP LEARNING USING, DOI DOI 10.48550/ARXIV.1803.08375
[2]   Singularities affect dynamics of learning in neuromanifolds [J].
Amari, Shun-ichi ;
Park, Hyeyoung ;
Ozeki, Tomoko .
NEURAL COMPUTATION, 2006, 18 (05) :1007-1065
[3]  
[Anonymous], 2018, INT C LEARN REPR WOR
[4]  
[Anonymous], IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2015.123
[5]  
[Anonymous], 1978, INTRO LECT CONVEX OP, DOI DOI 10.1007/BF02584795
[6]  
[Anonymous], 2011, P 14 INT C ARTIFICIA
[7]  
[Anonymous], 2010, P 27 INT C MACH LEAR, DOI 10.5555/3104322.3104425
[8]  
[Anonymous], 2012, ADADELTA ADAPTIVE LE
[9]  
[Anonymous], 2010, JMLR WORKSH C P
[10]  
[Anonymous], ARXIV