Gradient Descent with Low-Rank Objective Functions

被引:0
作者
Cosson, Romain [1 ]
Jadbabaie, Ali [2 ]
Makur, Anuran [3 ,4 ]
Reisizadeh, Amirhossein [2 ]
Shah, Devavrat [2 ]
机构
[1] INRIA, Paris, France
[2] MIT, Lab Informat & Decis Syst, Cambridge, MA 02139 USA
[3] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[4] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
来源
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC | 2023年
关键词
APPROXIMATION;
D O I
10.1109/CDC49753.2023.10383652
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Several recent empirical studies demonstrate that important machine learning tasks, e.g., training deep neural networks, exhibit low-rank structure, where the loss function varies significantly in only a few directions of the input space. In this paper, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (GD). Our proposed Low-Rank Gradient Descent (LRGD) algorithm finds an.epsilon-minimizer of a p-dimensional function by first identifying r <= p significant directions, and then estimating the true p-dimensional gradient at every iteration by computing directional derivatives only along those r directions. We establish that the "directional oracle complexity" of LRGD for strongly convex objective functions is O(r log(1/epsilon) + rp). Therefore, when r << p, LRGD provides significant improvement over the known complexity of O(p log(1/epsilon)) of GD in the strongly convex setting. Furthermore, using real and synthetic data, we empirically find that LRGD provides significant gains over GD when the data has low-rank structure, and in the absence of such structure, LRGD does not degrade performance compared to GD.
引用
收藏
页码:3309 / 3314
页数:6
相关论文
共 31 条
[1]  
Asuncion A., 2007, UCI Machine Learning Repository
[2]   Optimization Methods for Large-Scale Machine Learning [J].
Bottou, Leon ;
Curtis, Frank E. ;
Nocedal, Jorge .
SIAM REVIEW, 2018, 60 (02) :223-311
[3]  
Bubeck S., 2015, Foundations and Trends in Machine Learning, V8
[4]   Matrix Completion With Noise [J].
Candes, Emmanuel J. ;
Plan, Yaniv .
PROCEEDINGS OF THE IEEE, 2010, 98 (06) :925-936
[5]   Lower bounds for finding stationary points I [J].
Carmon, Yair ;
Duchi, John C. ;
Hinder, Oliver ;
Sidford, Aaron .
MATHEMATICAL PROGRAMMING, 2020, 184 (1-2) :71-120
[6]  
Chen Y., 2020, ARXIV201208496
[7]   ACTIVE SUBSPACE METHODS IN THEORY AND PRACTICE: APPLICATIONS TO KRIGING SURFACES [J].
Constantine, Paul G. ;
Dow, Eric ;
Wang, Qiqi .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (04) :A1500-A1524
[8]  
Dekel O, 2012, J MACH LEARN RES, V13, P165
[9]   PROJECTION-BASED APPROXIMATION AND A DUALITY WITH KERNEL METHODS [J].
DONOHO, DL ;
JOHNSTONE, IM .
ANNALS OF STATISTICS, 1989, 17 (01) :58-106
[10]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121