Powered stochastic optimization with hypergradient descent for large-scale learning systems

被引:1
作者
Yang, Zhuang [1 ]
Li, Xiaotian [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
关键词
Powerball function; Stochastic optimization; Variance reduction; Hypergradient descent; Adaptive learning rate; ALGORITHMS; SELECTION;
D O I
10.1016/j.eswa.2023.122017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic optimization (SO) algorithms based on the Powerball function, namely powered stochastic optimization (PoweredSO) algorithms, have been confirmed, effectively, and demonstrated great potential in the context of large-scale optimization and machine learning tasks. Nevertheless, the issue of how to determine the learning rate for PoweredSO is a challenge and still unsolved problem. In this paper, we propose a class of adaptive PoweredSO approaches that are efficient, scalable and robust. It takes advantage of the hypergradient descent (HD) technique to automatically acquire an online learning rate for PoweredSO-like methods. In the first part, we study the behavior of the canonical PoweredSO algorithm, the Powerball stochastic gradient descent (pbSGD) method, with HD. The existing PoweredSO algorithms also suffer from the high variance because they take the similar algorithmic framework to SO algorithms, arising from sampling tactics. Therefore, the second portion develops an adaptive powered variance-reduced optimization method via utilizing both variance-reduced technique and HD. Moreover, we present the convergence analysis of the proposed algorithms and explore their iteration complexity on non-convex cases. Numerical experiments are conducted on machine learning tasks, verifying the superior performance over modern SO algorithms.
引用
收藏
页数:14
相关论文
共 54 条
[1]   k-best feature selection and ranking via stochastic approximation [J].
Akman, David V. ;
Malekipirbazari, Milad ;
Yenice, Zeren D. ;
Yeo, Anders ;
Adhikari, Niranjan ;
Wong, Yong Kai ;
Abbasi, Babak ;
Gumus, Alev Taskin .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
[2]  
Alacaoglu A, 2022, PR MACH LEARN RES, V178, P778
[3]   A neural network-based distributional constraint learning methodology for mixed-integer stochastic optimization [J].
Alcantara, Antonio ;
Ruiz, Carlos .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 232
[4]   Control variates for stochastic gradient MCMC [J].
Baker, Jack ;
Fearnhead, Paul ;
Fox, Emily B. ;
Nemeth, Christopher .
STATISTICS AND COMPUTING, 2019, 29 (03) :599-615
[5]  
Baydin Atilim Gunes, 2018, INT C LEARNING REPRE
[6]   Machine learning for data-driven discovery in solid Earth geoscience [J].
Bergen, Karianne J. ;
Johnson, Paul A. ;
de Hoop, Maarten V. ;
Beroza, Gregory C. .
SCIENCE, 2019, 363 (6433) :1299-+
[7]  
Bernstein J, 2018, PR MACH LEARN RES, V80
[8]   A STOCHASTIC QUASI-NEWTON METHOD FOR LARGE-SCALE OPTIMIZATION [J].
Byrd, R. H. ;
Hansen, S. L. ;
Nocedal, Jorge ;
Singer, Y. .
SIAM JOURNAL ON OPTIMIZATION, 2016, 26 (02) :1008-1031
[9]   DeepMag: Source-Specific Change Magnification Using Gradient Ascent [J].
Chen, Weixuan ;
McDuff, Daniel .
ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (01)
[10]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121