On Convergence of Gradient Descent Ascent: A Tight Local Analysis

被引：0

作者：

Li, Haochuan ^{[1
]}

Farnia, Farzan ^{[2
]}

Das, Subhro ^{[3
]}

Jadbabaie, Ali ^{[4
]}

机构：

[1] MIT, Dept EECS, Cambridge, MA 02139 USA

[2] Chinese Univ Hong Kong, Dept CSE, Hong Kong, Peoples R China

[3] MIT, IBM Watson AI Lab, IBM Res, Cambridge, MA 02139 USA

[4] MIT, Dept CEE, Cambridge, MA 02139 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

EQUILIBRIA;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

mainstream algorithms for minimax optimization in generative adversarial networks (GANs). Convergence properties of GDA have drawn significant interest in the recent literature. Specifically, for min(x) max(y) f (x; y) where f is strongly-concave in y and possibly nonconvex in x, (Lin et al., 2020) proved the convergence of GDA with a stepsize ratio eta(y)/eta(x) = Theta(kappa(2)) where eta(x) and eta(y) are the stepsizes for x and y and kappa is the condition number for y. While this stepsize ratio suggests a slow training of the min player, practical GAN algorithms typically adopt similar stepsizes for both variables, indicating a wide gap between theoretical and empirical results. In this paper, we aim to bridge this gap by analyzing the local convergence of general nonconvex-nonconcave minimax problems. We demonstrate that a step-size ratio of Theta(kappa) is necessary and sufficient for local convergence of GDA to a Stackelberg Equilibrium, where kappa is the local condition number for y. We prove a nearly tight convergence rate with a matching lower bound. We further extend the convergence guarantees to stochastic GDA and extra-gradient methods (EG). Finally, we conduct several numerical experiments to support our theoretical findings.

引用

页数：24

共 50 条

[1] Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency
Deng, Yuyang
Mandavi, Mehrdad
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[2] Local Convergence of Gradient Descent-Ascent for Training Generative Adversarial Networks
Becker, Evan
Pandit, Parthe
Rangan, Sundeep
Fletcher, Alyson K.
FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 892 - 896
[3] A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates
Arjevani, Yossi
Shamir, Ohad
Srebro, Nathan
ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 111 - 132
[4] On the Convergence of Stochastic Compositional Gradient Descent Ascent Method
Gao, Hongchang
Wang, Xiaoqian
Luo, Lei
Shi, Xinghua
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2389 - 2395
[5] Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization
Zhang, Guodong
Wang, Yuanhao
Lessard, Laurent
Grosse, Roger
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[6] Tight Convergence Rate of Gradient Descent for Eigenvalue Computation
Ding, Qinghua
Zhou, Kaiwen
Cheng, James
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3276 - 3282
[7] A Comprehensively Tight Analysis of Gradient Descent for PCA
Xu, Zhiqiang
Li, Ping
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[8] On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum
Gao, Hongchang
Li, Junyi
Huang, Heng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[9] Randomized Stochastic Gradient Descent Ascent
Sebbouh, Othmane
Cuturi, Marco
Peyre, Gabriel
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[10] Convergence analysis of gradient descent stochastic algorithms
Shapiro, A
Wardi, Y
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454

← 1 2 3 4 5 →