Gradient Descent with Low-Rank Objective Functions

被引：0

作者：

Cosson, Romain ^{[1
]}

Jadbabaie, Ali ^{[2
]}

Makur, Anuran ^{[3
,4
]}

Reisizadeh, Amirhossein ^{[2
]}

Shah, Devavrat ^{[2
]}

机构：

[1] INRIA, Paris, France

[2] MIT, Lab Informat & Decis Syst, Cambridge, MA 02139 USA

[3] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA

[4] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA

来源：

2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC | 2023年

关键词：

APPROXIMATION;

D O I：

10.1109/CDC49753.2023.10383652

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Several recent empirical studies demonstrate that important machine learning tasks, e.g., training deep neural networks, exhibit low-rank structure, where the loss function varies significantly in only a few directions of the input space. In this paper, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (GD). Our proposed Low-Rank Gradient Descent (LRGD) algorithm finds an.epsilon-minimizer of a p-dimensional function by first identifying r <= p significant directions, and then estimating the true p-dimensional gradient at every iteration by computing directional derivatives only along those r directions. We establish that the "directional oracle complexity" of LRGD for strongly convex objective functions is O(r log(1/epsilon) + rp). Therefore, when r << p, LRGD provides significant improvement over the known complexity of O(p log(1/epsilon)) of GD in the strongly convex setting. Furthermore, using real and synthetic data, we empirically find that LRGD provides significant gains over GD when the data has low-rank structure, and in the absence of such structure, LRGD does not degrade performance compared to GD.

引用

页码：3309 / 3314

页数：6

共 31 条

[1]

Asuncion A., 2007, UCI Machine Learning Repository

[2] Optimization Methods for Large-Scale Machine Learning [J].