STOCHASTIC MULTILEVEL COMPOSITION OPTIMIZATION ALGORITHMS WITH LEVEL-INDEPENDENT CONVERGENCE RATES

被引:16
作者
Balasubramanian, Krishnakumar [1 ]
Ghadimi, Saeed [2 ]
Nguyen, Anthony [3 ]
机构
[1] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA
[2] Univ Waterloo, Dept Management Sci, Waterloo, ON N2L 3G1, Canada
[3] Univ Calif Davis, Dept Math, Davis, CA 95616 USA
基金
加拿大自然科学与工程研究理事会;
关键词
multilevel stochastic composition; nonconvex optimization; level-independent convergence rate; complexity bounds; APPROXIMATION METHOD; GRADIENT DESCENT;
D O I
10.1137/21M1406222
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we study smooth stochastic multilevel composition optimization problems, where the objective function is a nested composition of T functions. We assume access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle. For solving this class of problems, we propose two algorithms using moving-average stochastic estimates, and analyze their convergence to an epsilon-stationary point of the problem. We show that the first algorithm, which is a generalization of [S. Ghadimi, A. Ruszczynski, and M. Wang, SIAM T. Optim., 30 (2020), pp. 960-979] to the T level case, can achieve a sample complexity of O-T(1/epsilon(6)) by using minibatches of samples in each iteration, where O-T hides constants that depend on T. By modifying this algorithm using linearized stochastic estimates of the function values, we improve the sample complexity to O-T(1/epsilon(4)). This modification not only removes the requirement of having a minibatch of samples in each iteration, but also makes the algorithm parameter-free and easy to implement. To the best of our knowledge, this is the first time that such an online algorithm designed for the (un)constrained multilevel setting obtains the same sample complexity of the smooth single-level setting, under standard assumptions (unbiasedness and boundedness of the second moments) on the stochastic first-order oracle.
引用
收藏
页码:519 / 544
页数:26
相关论文
共 38 条
[1]  
Anastasiou Andreas, 2019, P MACHINE LEARNING R, P115
[2]  
[Anonymous], 2003, SEMIPARAMETRIC REGRE, DOI DOI 10.1017/CBO9780511755453
[3]  
ARJEVANI Y., 2019, PREPRINT ARXIV191202
[4]  
BLANCHET J., 2017, PREPRINT
[5]  
Bora A, 2017, PR MACH LEARN RES, V70
[6]  
Borkar V.S., 2009, Stochastic approximation: a dynamical systems viewpoint, V48
[7]   Solving Stochastic Compositional Optimization is Nearly as Easy as Solving Stochastic Optimization [J].
Chen, Tianyi ;
Sun, Yuejiao ;
Yin, Wotao .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 :4937-4948
[8]   Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks [J].
Cong, Weilin ;
Forsati, Rana ;
Kandemir, Mahmut ;
Mahdavi, Mehrdad .
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :1393-1403
[9]   STOCHASTIC MODEL-BASED MINIMIZATION OF WEAKLY CONVEX FUNCTIONS [J].
Davis, Damek ;
Drusvyatskiy, Dmitriy .
SIAM JOURNAL ON OPTIMIZATION, 2019, 29 (01) :207-239
[10]   Statistical estimation of composite risk functionals and risk optimization problems [J].
Dentcheva, Darinka ;
Penev, Spiridon ;
Ruszczynski, Andrzej .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2017, 69 (04) :737-760