Dropout as a Low-Rank Regularizer for Matrix Factorization

被引:0
作者
Cavazza, Jacopo [1 ]
Haeffele, Benjamin D. [2 ]
Lane, Connor [2 ]
Morerio, Pietro [1 ]
Murino, Vittorio [1 ]
Vidal, Rene [2 ]
机构
[1] Ist Italiano Tecnol, Pattern Anal & Comp Vis, I-16163 Genoa, Italy
[2] Johns Hopkins Univ, Ctr Imaging Sci, Baltimore, MD 21218 USA
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84 | 2018年 / 84卷
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dropout is a simple yet effective regularization technique that has been applied to various machine learning tasks, including linear classification, matrix factorization (MF) and deep learning. However, despite its solid empirical performance, the theoretical properties of dropout as a regularizer remain quite elusive. In this paper, we present a theoretical analysis of dropout for MF, where Bernoulli random variables are used to drop columns of the factors. We demonstrate the equivalence between dropout and a fully deterministic model for MF in which the factors are regularized by the sum of the product of squared Euclidean norms of the columns. Additionally, we investigate the case of a variable sized factorization and we prove that dropout is equivalent to a convex approximation problem with (squared) nuclear norm regularization. As a consequence, we conclude that dropout induces a low-rank regularizer that results in a data dependent singular-value thresholding.
引用
收藏
页数:10
相关论文
共 39 条
[1]  
Achille Alessandro, 2016, ARXIV E PRINTS
[2]  
[Anonymous], 2009, INT C MACH LEARN
[3]  
[Anonymous], NIPS
[4]  
[Anonymous], 2005, ICML
[5]   Convex multi-task feature learning [J].
Argyriou, Andreas ;
Evgeniou, Theodoros ;
Pontil, Massimiliano .
MACHINE LEARNING, 2008, 73 (03) :243-272
[6]  
Bach F., 2008, CORR08121869V1
[7]  
Bach F., 2013, CORR13093117V1
[8]  
Baldi P., 2014, ARTIFICIAL INTELLIGE
[9]  
Bayer J., 2013, CORR13110701
[10]   TRAINING WITH NOISE IS EQUIVALENT TO TIKHONOV REGULARIZATION [J].
BISHOP, CM .
NEURAL COMPUTATION, 1995, 7 (01) :108-116