An accelerated proximal algorithm for regularized nonconvex and nonsmooth bi-level optimization

被引:1
作者
Chen, Ziyi [1 ]
Kailkhura, Bhavya [2 ]
Zhou, Yi [1 ]
机构
[1] Univ Utah, Elect & Comp Dept, 50 Cent Campus Dr 2110, Salt Lake City, UT 84112 USA
[2] Lawrence Livermore Natl Lab, 7000 East Ave, Livermore, CA 10587 USA
基金
美国国家科学基金会;
关键词
Bilevel optimization; Nesterov's momentum; Nonconvex regularization; Proximal algorithm; CONVERGENCE;
D O I
10.1007/s10994-023-06329-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from a high computation complexity in nonconvex bi-level optimization. In this work, we study a proximal gradient-type algorithm that adopts the approximate implicit differentiation (AID) scheme for nonconvex bi-level optimization with possibly nonconvex and nonsmooth regularizers. In particular, the algorithm applies the Nesterov's momentum to accelerate the computation of the implicit gradient involved in AID. We provide a comprehensive analysis of the global convergence properties of this algorithm through identifying its intrinsic potential function. In particular, we formally establish the convergence of the model parameters to a critical point of the bi-level problem, and obtain an improved computation complexity (O) over tilde(?(3.5)e(-2))over the state-of-the-art result. Moreover, we analyze the asymptotic convergence rates of this algorithm under a class of local nonconvex geometries characterized by a Lojasiewicz-type gradient inequality. Experiment on hyper-parameter optimization demonstrates the effectiveness of our algorithm.
引用
收藏
页码:1433 / 1463
页数:31
相关论文
共 56 条
[1]  
[Anonymous], 2013, Introductory lectures on convex optimization: a basic course
[2]  
[Anonymous], 2019, Hyperparameter Optimization, DOI [DOI 10.1007/978-3-030-05318-51, 10.1007/978-3-030-05318-5_1]
[3]   On the convergence of the proximal algorithm for nonsmooth functions involving analytic features [J].
Attouch, Hedy ;
Bolte, Jerome .
MATHEMATICAL PROGRAMMING, 2009, 116 (1-2) :5-16
[4]  
Bertinetto L., 2018, PROCEEDING INT C LEA
[5]   THE LOJASIEWICZ INEQUALITY FOR NONSMOOTH SUBANALYTIC FUNCTIONS WITH APPLICATIONS TO SUBGRADIENT DYNAMICAL SYSTEMS [J].
Bolte, Jerome ;
Daniilidis, Aris ;
Lewis, Adrian .
SIAM JOURNAL ON OPTIMIZATION, 2007, 17 (04) :1205-1223
[6]   Proximal alternating linearized minimization for nonconvex and nonsmooth problems [J].
Bolte, Jerome ;
Sabach, Shoham ;
Teboulle, Marc .
MATHEMATICAL PROGRAMMING, 2014, 146 (1-2) :459-494
[7]   MATHEMATICAL PROGRAMS WITH OPTIMIZATION PROBLEMS IN CONSTRAINTS [J].
BRACKEN, J ;
MCGILL, JT .
OPERATIONS RESEARCH, 1973, 21 (01) :37-44
[8]  
Chen T., 2021, ARXIV
[9]  
Chen Z., 2021, PROCEEDING INT C LEA
[10]  
Dagreou M., 2022, ADV NEUR IN