Guided Layer-Wise Learning for Deep Models Using Side Information

被引:0
作者
Sulimov, Pavel [1 ]
Sukmanova, Elena [1 ]
Chereshnev, Roman [1 ]
Kertesz-Farkas, Attila [1 ]
机构
[1] Natl Res Univ Higher Sch Econ HSE, Dept Data Anal & Artificial Intelligence, Fac Comp Sci, 3 Kochnovsky Proezd, Moscow, Russia
来源
ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS (AIST 2019) | 2020年 / 1086卷
关键词
Deep learning; Variational methods; Abstract representation; BHATTACHARYYA DISTANCE; NETWORKS;
D O I
10.1007/978-3-030-39575-9_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training of deep models for classification tasks is hindered by local minima problems and vanishing gradients, while unsupervised layer-wise pretraining does not exploit information from class labels. Here, we propose a new regularization technique, called diversifying regularization (DR), which applies a penalty on hidden units at any layer if they obtain similar features for different types of data. For generative models, DR is defined as divergence over the variational posteriori distributions and included in the maximum likelihood estimation as a prior. Thus, DR includes class label information for greedy pretraining of deep belief networks which result in a better weight initialization for fine-tuning methods. On the other hand, for discriminative training of deep neural networks, DR is defined as a distance over the features and included in the learning objective. With our experimental tests, we show that DR can help the backpropagation to cope with vanishing gradient problems and to provide faster convergence and smaller generalization errors.
引用
收藏
页码:50 / 61
页数:12
相关论文
共 30 条
[1]   An introduction to MCMC for machine learning [J].
Andrieu, C ;
de Freitas, N ;
Doucet, A ;
Jordan, MI .
MACHINE LEARNING, 2003, 50 (1-2) :5-43
[2]  
[Anonymous], 2009, AISTATS
[3]  
[Anonymous], 2007, Proc. Adv. Neural Inf. Process. Syst.
[4]  
[Anonymous], 2010, P 13 INT C ART INT S
[5]  
[Anonymous], 1982, Competition and Cooperation in Neural Nets, DOI DOI 10.1007/978-3-642-46466-9_18
[6]  
[Anonymous], 2008, P 25 INT C MACH LEAR, DOI DOI 10.1145/1390156.1390224
[7]  
[Anonymous], 2008, P 25 INT C MACH LEAR
[8]  
[Anonymous], 2016, TUTORIAL VARIATIONAL
[9]  
Arnold L., 2012, CoRR
[10]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127