Evolving Deep Neural Networks via Cooperative Coevolution With Backpropagation

被引:32
作者
Gong, Maoguo [1 ]
Liu, Jia [2 ]
Qin, A. K. [3 ]
Zhao, Kun [1 ]
Tan, Kay Chen [4 ]
机构
[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Minist Educ, Key Lab Intelligent Percept & Image Understanding, Xian 710071, Peoples R China
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[3] Swinburne Univ Technol, Dept Comp Sci & Software Engn, Melbourne, Vic 3122, Australia
[4] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
澳大利亚研究理事会;
关键词
Neurons; Feature extraction; Optimization; Training; Biological neural networks; Computer architecture; Backpropagation; Backpropagation (BP); cooperative coevolution (CC); deep neural networks (DNNs); evolutionary optimization; MULTIOBJECTIVE OPTIMIZATION ALGORITHM; EVOLUTION; SEARCH; MODEL;
D O I
10.1109/TNNLS.2020.2978857
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks (DNNs), characterized by sophisticated architectures capable of learning a hierarchy of feature representations, have achieved remarkable successes in various applications. Learning DNN's parameters is a crucial but challenging task that is commonly resolved by using gradient-based backpropagation (BP) methods. However, BP-based methods suffer from severe initialization sensitivity and proneness to getting trapped into inferior local optima. To address these issues, we propose a DNN learning framework that hybridizes CC-based optimization with BP-based gradient descent, called BPCC, and implement it by devising a computationally efficient CC-based optimization technique dedicated to DNN parameter learning. In BPCC, BP will intermittently execute for multiple training epochs. Whenever the execution of BP in a training epoch cannot sufficiently decrease the training objective function value, CC will kick in to execute by using the parameter values derived by BP as the starting point. The best parameter values obtained by CC will act as the starting point of BP in its next training epoch. In CC-based optimization, the overall parameter learning task is decomposed into many subtasks of learning a small portion of parameters. These subtasks are individually addressed in a cooperative manner. In this article, we treat neurons as basic decomposition units. Furthermore, to reduce the computational cost, we devise a maturity-based subtask selection strategy to selectively solve some subtasks of higher priority. Experimental results demonstrate the superiority of the proposed method over common-practice DNN parameter learning techniques.
引用
收藏
页码:420 / 434
页数:15
相关论文
共 48 条
[1]  
[Anonymous], Efficient learning of sparse representations with an energy
[2]  
[Anonymous], 2014, P IES
[3]  
Arjovsky M, 2016, PR MACH LEARN RES, V48
[4]   Evolving the Topology of Large Scale Deep Neural Networks [J].
Assuncao, Filipe ;
Lourenco, Nuno ;
Machado, Penousal ;
Ribeiro, Bernardete .
GENETIC PROGRAMMING (EUROGP 2018), 2018, 10781 :19-34
[5]   An Overview of Evolutionary Algorithms for Parameter Optimization [J].
Baeck, Thomas ;
Schwefel, Hans-Paul .
EVOLUTIONARY COMPUTATION, 1993, 1 (01) :1-23
[6]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[7]  
Chand S, 2014, IEEE IJCNN, P202, DOI 10.1109/IJCNN.2014.6889568
[8]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[9]   Large Scale Evolution of Convolutional Neural Networks Using Volunteer Computing [J].
Desell, Travis .
PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCO'17 COMPANION), 2017, :127-128
[10]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923