Transformed l1 regularization for learning sparse deep neural networks

被引：81

作者：

Ma, Rongrong ^{[1
]}

Miao, Jianyu ^{[2
]}

Niu, Lingfeng ^{[3
]}

Zhang, Peng ^{[4
]}

机构：

[1] Univ Chinese Acad Sci, Sch Math Sci, Beijing 100049, Peoples R China

[2] Henan Univ Technol, Coll Informat Sci & Engn, Zhengzhou 450001, Henan, Peoples R China

[3] Univ Chinese Acad Sci, Sch Econ & Management, Beijing 100190, Peoples R China

[4] Ant Financial Serv Grp, Hangzhou 310012, Zhejiang, Peoples R China

来源：

NEURAL NETWORKS | 2019年 / 119卷

基金：

中国国家自然科学基金;

关键词：

Deep neural networks; Non-convex regularization; Transformed l(1); Group sparsity; VARIABLE SELECTION; REPRESENTATION; MINIMIZATION; DROPOUT;

D O I：

10.1016/j.neunet.2019.08.015

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep Neural Networks (DNNs) have achieved extraordinary success in numerous areas. However, DNNs often carry a large number of weight parameters, leading to the challenge of heavy memory and computation costs. Overfitting is another challenge for DNNs when the training data are insufficient. These challenges severely hinder the application of DNNs in resource-constrained platforms. In fact, many network weights are redundant and can be removed from the network without much loss of performance. In this paper, we introduce a new non-convex integrated transformed l(1) regularizer to promote sparsity for DNNs, which removes redundant connections and unnecessary neurons simultaneously. Specifically, we apply the transformed l(1) regularizer to the matrix space of network weights and utilize it to remove redundant connections. Besides, group sparsity is integrated to remove unnecessary neurons. An efficient stochastic proximal gradient algorithm is presented to solve the new model. To the best of our knowledge, this is the first work to develop a non-convex regularizer in sparse optimization based method to simultaneously promote connection-level and neuron-level sparsity for DNNs. Experiments on public datasets demonstrate the effectiveness of the proposed method. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：286 / 298

页数：13

共 65 条

[1]

[Anonymous], DATA MODELING VISUAL

[2]

[Anonymous], ARXIV14121442

[3]

[Anonymous], ACM J EMERG TECH COM

[4]

[Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]

[5]

[Anonymous], VARIABLE SELECTION V

[6]

[Anonymous], 2013, FOUND TRENDS SIGNAL, DOI DOI 10.1561/2000000039

[7]

[Anonymous], 2006, J ROYAL STAT SOC B

[8]

[Anonymous], IEEE T NEURAL NETWOR

[9]

[Anonymous], 2016, Deep learning. vol

[10]

[Anonymous], ARXIV170405119

← 1 2 3 4 5 6 7 →