A homotopy training algorithm for fully connected neural networks

被引:15
作者
Chen, Qipin [1 ]
Hao, Wenrui [1 ]
机构
[1] Penn State Univ, Dept Math, University Pk, PA 16802 USA
来源
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES | 2019年 / 475卷 / 2231期
基金
美国国家科学基金会;
关键词
homotopy method; training algorithm; machine learning; neural network; PARAMETER-ESTIMATION; FEEDFORWARD NETWORKS; EQUATIONS;
D O I
10.1098/rspa.2019.0662
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper, we present a homotopy training algorithm (HTA) to solve optimization problems arising from fully connected neural networks with complicated structures. The HTA dynamically builds the neural network starting from a simplified version and ending with the fully connected network via adding layers and nodes adaptively. Therefore, the corresponding optimization problem is easy to solve at the beginning and connects to the original model via a continuous path guided by the HTA, which provides a high probability of obtaining a global minimum. By gradually increasing the complexity of the model along the continuous path, the HTA provides a rather good solution to the original loss function. This is confirmed by various numerical results including VGG models on CIFAR-10. For example, on the VGG13 model with batch normalization, HTA reduces the error rate by 11.86% on the test dataset compared with the traditional method. Moreover, the HTA also allows us to find the optimal structure for a fully connected neural network by building the neutral network adaptively.
引用
收藏
页数:19
相关论文
共 44 条
[1]  
[Anonymous], 2009, TECHNICAL REPORT
[2]  
[Anonymous], P 25 INT JOINT C ART, DOI DOI 10.48550/ARXIV.1604.08880
[3]  
[Anonymous], 2016, ARXIV PREPRINT ARXIV
[4]  
[Anonymous], P 3 INT C LEARNING R
[5]  
[Anonymous], APPL FUZZY SET THEOR
[6]  
[Anonymous], 2006, Sparse grid tutorial
[7]  
[Anonymous], 2018, BINARYRELAX RELAXATI
[8]  
[Anonymous], 2016, DEEP LEARNING
[9]  
[Anonymous], ADV NEURAL INFORM PR
[10]  
[Anonymous], 2008, IEEE C COMP VIS PATT