On the Convergence Rate of Quasi-Newton Methods on Strongly Convex Functions with Lipschitz Gradient

被引:3
作者
Krutikov, Vladimir [1 ,2 ]
Tovbis, Elena [3 ]
Stanimirovic, Predrag [1 ,4 ]
Kazakovtsev, Lev [1 ,3 ]
机构
[1] Siberian Fed Univ, Lab Hybrid Methods Modeling & Optimizat Complex Sy, 79 Svobodny Prospekt, Krasnoyarsk 660041, Russia
[2] Kemerovo State Univ, Dept Appl Math, 6 Krasnaya St, Kemerovo 650043, Russia
[3] Reshetnev Siberian State Univ Sci & Technol, Inst Informat & Telecommun, 31 Krasnoyarskii Rabochii Prospekt, Krasnoyarsk 660037, Russia
[4] Univ Nis, Fac Sci & Math, Nish 18000, Serbia
关键词
minimization; quasi-Newton method; convergence rate; METRIC SSVM ALGORITHMS;
D O I
10.3390/math11234715
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The main results of the study of the convergence rate of quasi-Newton minimization methods were obtained under the assumption that the method operates in the region of the extremum of the function, where there is a stable quadratic representation of the function. Methods based on the quadratic model of the function in the extremum area show significant advantages over classical gradient methods. When solving a specific problem using the quasi-Newton method, a huge number of iterations occur outside the extremum area, unless there is a stable quadratic approximation of the function. In this paper, we study the convergence rate of quasi-Newton-type methods on strongly convex functions with a Lipschitz gradient, without using local quadratic approximations of a function based on the properties of its Hessian. We proved that quasi-Newton methods converge on strongly convex functions with a Lipschitz gradient with the rate of a geometric progression, while the estimate of the convergence rate improves with the increasing number of iterations, which reflects the fact that the learning (adaptation) effect accumulates as the method operates. Another important fact discovered during the theoretical study is the ability of quasi-Newton methods to eliminate the background that slows down the convergence rate. This elimination is achieved through a certain linear transformation that normalizes the elongation of function level surfaces in different directions. All studies were carried out without any assumptions regarding the matrix of second derivatives of the function being minimized.
引用
收藏
页数:15
相关论文
共 53 条
[1]  
[Anonymous], 1972, Mathematical Programming, DOI DOI 10.1007/BF01584554
[2]   Quasi-Newton methods for machine learning: forget the past, just sample [J].
Berahas, A. S. ;
Jahani, M. ;
Richtarik, P. ;
Takac, M. .
OPTIMIZATION METHODS & SOFTWARE, 2022, 37 (05) :1668-1704
[3]  
Biggs M. C., 1971, Journal of the Institute of Mathematics and Its Applications, V8, P315
[4]  
BRODLIE KW, 1977, MATH PROGRAM, V12, P344, DOI 10.1007/BF01593802
[5]  
Broyden C. G., 1970, Journal of the Institute of Mathematics and Its Applications, V6, P222
[6]   A STOCHASTIC QUASI-NEWTON METHOD FOR LARGE-SCALE OPTIMIZATION [J].
Byrd, R. H. ;
Hansen, S. L. ;
Nocedal, Jorge ;
Singer, Y. .
SIAM JOURNAL ON OPTIMIZATION, 2016, 26 (02) :1008-1031
[7]   OPTIMALLY CONDITIONED OPTIMIZATION ALGORITHMS WITHOUT LINE SEARCHES [J].
DAVIDON, WC .
MATHEMATICAL PROGRAMMING, 1975, 9 (01) :1-30
[8]  
Davidon WC, 1959, ANL-5990
[9]   Enhancing Quasi-Newton Acceleration for Fluid-Structure Interaction [J].
Davis, Kyle ;
Schulte, Miriam ;
Uekermann, Benjamin .
MATHEMATICAL AND COMPUTATIONAL APPLICATIONS, 2022, 27 (03)
[10]  
Dennis JE, 1983, Numerical methods for unconstrained optimization and nonlinear equations