Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware

被引：36

作者：

Liu, Shiwei ^{[1
]}

Mocanu, Decebal Constantin ^{[1
,3
]}

Matavalam, Amarsagar Reddy Ramapuram ^{[2
]}

Pei, Yulong ^{[1
]}

Pechenizkiy, Mykola ^{[1
]}

机构：

[1] Eindhoven Univ Technol, Dept Math & Comp Sci, NL-5600 MB Eindhoven, Netherlands

[2] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA USA

[3] Univ Twente, Fac Elect Engn Math & Comp Sci, NL-7522 NB Enschede, Netherlands

来源：

NEURAL COMPUTING & APPLICATIONS | 2021年 / 33卷 / 07期

关键词：

Truly sparse neural networks; Sparse evolutionary training (SET); Microarray gene expression; Adaptive sparse connectivity; FEATURE-SELECTION; CLASSIFICATION; INFERENCE; DNA;

D O I：

10.1007/s00521-020-05136-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Artificial neural networks (ANNs) have emerged as hot topics in the research community. Despite the success of ANNs, it is challenging to train and deploy modern ANNs on commodity hardware due to the ever-increasing model size and the unprecedented growth in the data volumes. Particularly for microarray data, the very high dimensionality and the small number of samples make it difficult for machine learning techniques to handle. Furthermore, specialized hardware such as graphics processing unit (GPU) is expensive. Sparse neural networks are the leading approaches to address these challenges. However, off-the-shelf sparsity-inducing techniques either operate from a pretrained model or enforce the sparse structure via binary masks. The training efficiency of sparse neural networks cannot be obtained practically. In this paper, we introduce a technique allowing us to train truly sparse neural networks with fixed parameter count throughout training. Our experimental results demonstrate that our method can be applied directly to handle high-dimensional data, while achieving higher accuracy than the traditional two-phase approaches. Moreover, we have been able to create truly sparse multilayer perceptron models with over one million neurons and to train them on a typical laptop without GPU (), this being way beyond what is possible with any state-of-the-art technique.

引用

页码：2589 / 2604

页数：16

共 73 条

[1] On the Estimation and Control of Nonlinear Systems With Parametric Uncertainties and Noisy Outputs [J].

Alberto Meda-Campana, Jesus .

IEEE ACCESS, 2018, 6 :31968-31973

[2] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].

Alipanahi, Babak ;

Delong, Andrew ;

Weirauch, Matthew T. ;

Frey, Brendan J. .

NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+

[3]

[Anonymous], 2017, ARXIV170905027

[4]

Bekkerman R., 2003, Journal of Machine Learning Research, V3, P1183, DOI 10.1162/153244303322753625

[5] Reconciling modern machine-learning practice and the classical bias-variance trade-off [J].

Belkin, Mikhail ;

Hsu, Daniel ;

Ma, Siyuan ;

Mandal, Soumik .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) :15849-15854

[6]

Bellec G., 2017, INT C LEARN REPR

[7] Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking [J].

Bermejo, Pablo ;

de la Ossa, Luis ;

Gamez, Jose A. ;

Puerta, Jose M. .

KNOWLEDGE-BASED SYSTEMS, 2012, 25 (01) :35-44

[8]

BOURGIN DD, 2019, ARXIV190509397

[9]

Caruana R., 2003, Journal of Machine Learning Research, V3, P1245, DOI 10.1162/153244303322753652

[10] A survey on feature selection methods [J].

Chandrashekar, Girish ;

Sahin, Ferat .

COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28

← 1 2 3 4 5 6 7 8 →