SPARSE DEEP NEURAL NETWORKS USING L1,∞-WEIGHT NORMALIZATION

被引：3

作者：

Wen, Ming ^{[1
]}

Xu, Yixi ^{[3
]}

Zheng, Yunling ^{[2
]}

Yang, Zhouwang ^{[1
]}

Wang, Xiao ^{[3
]}

机构：

[1] Univ Sci & Technol China, Sch Math Sci, Hefei, Peoples R China

[2] Univ Sci & Technol China, Sch Gifted Young, Hefei, Peoples R China

[3] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA

来源：

STATISTICA SINICA | 2021年 / 31卷 / 03期

基金：

美国国家科学基金会;

关键词：

Deep neural networks; generalization; overfitting; rademarcher complexity; sparsity;

D O I：

10.5705/ss.202018.0468

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Deep neural networks (DNNs) have recently demonstrated an excellent performance on many challenging tasks. However, overfitting remains a significant challenge in DNNs. Empirical evidence suggests that inducing sparsity can relieve overfitting, and that weight normalization can accelerate the algorithm convergence. In this study, we employ L-1,L-infinity weight normalization for DNNs with bias neurons to achieve a sparse architecture. We theoretically establish the generalization error bounds for both regression and classification under the L-1,L-infinity weight normalization. Furthermore, we show that the upper bounds are independent of the network width and the root k-dependence on the network depth k, which are the best available bounds for networks with bias neurons. These results provide theoretical justifications for using such weight normalization to reduce the generalization error. We also develop an easily implemented gradient projection descent algorithm to practically obtain a sparse neural network. Finally, we present various experiments that validate our theory and demonstrate the effectiveness of the resulting approach.

引用

页码：1397 / 1414

页数：18

共 23 条

[1]

[Anonymous], 2017, P 31 C NEUR INF PROC

[2]

[Anonymous], INT C MACH LEARN

[3]

[Anonymous], 2016, P 4 INT C LEARN REPR

[4] The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network [J].

Bartlett, PL .

IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (02) :525-536

[5]

Duchi J., 2008, INT C MACH LEARN, P272, DOI [DOI 10.1145/1390156.1390191, 10.1145/1390156.1390191]

[6]

Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1

[7]

Jaderberg M, 2015, ADV NEUR IN, V28

[8]

Kohavi R., 1995, IJCAI-95. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, P1137

[9] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

[10]

Krizhevsky Alex, 2009, LEARNING MULTIPLE LA

← 1 2 3 →