The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems

被引：1

作者：

Yang, Zhuang ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年

基金：

中国国家自然科学基金;

关键词：

Optimization; Convergence; Approximation algorithms; Stochastic processes; Learning systems; Support vector machines; Noise measurement; Biased gradient estimator; convergence rates; large-scale datasets; Powerball function; stochastic optimization (SO); DESCENT; CONVERGENCE;

D O I：

10.1109/TCSS.2024.3411630

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Powerball method, via incorporating a power coefficient into conventional optimization algorithms, has been considered in accelerating stochastic optimization (SO) algorithms in recent years, giving rise to a series of powered stochastic optimization (PSO) algorithms. Although the Powerball technique is orthogonal to the existing accelerated techniques (e.g., the learning rate adjustment strategy) for SO algorithms, the current PSO algorithms take a nearly similar algorithm framework to SO algorithms, where the direct negative result for PSO algorithms is making them inherit low-convergence rate and unstable performance from SO for practical problems. Inspired by this gap, this work develops a novel class of PSO algorithms from the perspective of biased stochastic gradient estimation (BSGE). Specifically, we first explore the theoretical property and the empirical characteristic of vanilla-powered stochastic gradient descent (P-SGD) with BSGE. Second, to further demonstrate the positive impact of BSGE in enhancing the P-SGD type algorithm, we investigate the feature of theory and experiment of P-SGD with momentum under BSGE, where we particularly focus on the effect of negative momentum in P-SGD that is less studied in PSO. Particularly, we prove that the overall complexity of the resulting algorithms matches that of advanced SO algorithms. Finally, large numbers of numerical experiments on benchmark datasets confirm the successful reformation of BSGE in perfecting PSO. This work provides comprehension of the role of BSGE in PSO algorithms, extending the family of PSO algorithms.

引用

页数：13

共 50 条

[1] Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning
Yang, Zhuang
IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (06) : 1598 - 1606
[2] Large-scale machine learning with fast and stable stochastic conjugate gradient
Yang, Zhuang
COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 173
[3] Value function gradient learning for large-scale multistage stochastic programming problems
Lee, Jinkyu
Bae, Sanghyeon
Kim, Woo Chang
Lee, Yongjae
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 308 (01) : 321 - 335
[4] SAAGs: Biased stochastic variance reduction methods for large-scale learning
Chauhan, Vinod Kumar
Sharma, Anuj
Dahiya, Kalpana
APPLIED INTELLIGENCE, 2019, 49 (09) : 3331 - 3361
[5] Controllability Maximization of Large-Scale Systems Using Projected Gradient Method
Sato, Kazuhiro
Takeda, Akiko
IEEE CONTROL SYSTEMS LETTERS, 2020, 4 (04): : 821 - 826
[6] MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING
Wiesler, Simon
Richard, Alexander
Schlueter, Ralf
Ney, Hermann
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[7] A Stochastic Quasi-Newton Method for Large-Scale Nonconvex Optimization With Applications
Chen, Huiming
Wu, Ho-Chun
Chan, Shing-Chow
Lam, Wong-Hing
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4776 - 4790
[8] On Biased Stochastic Gradient Estimation
Driggs, Derek
Liang, Jingwei
Schonlieb, Carola-Bibiane
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[9] A large-scale stochastic gradient descent algorithm over a graphon
Chen, Yan
Li, Tao
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4806 - 4811
[10] Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning
Liu, Yuanyuan
Shang, Fanhua
Liu, Hongying
Kong, Lin
Jiao, Licheng
Lin, Zhouchen
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4242 - 4255

← 1 2 3 4 5 →