The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems

被引:1
|
作者
Yang, Zhuang [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
来源
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年
基金
中国国家自然科学基金;
关键词
Optimization; Convergence; Approximation algorithms; Stochastic processes; Learning systems; Support vector machines; Noise measurement; Biased gradient estimator; convergence rates; large-scale datasets; Powerball function; stochastic optimization (SO); DESCENT; CONVERGENCE;
D O I
10.1109/TCSS.2024.3411630
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The Powerball method, via incorporating a power coefficient into conventional optimization algorithms, has been considered in accelerating stochastic optimization (SO) algorithms in recent years, giving rise to a series of powered stochastic optimization (PSO) algorithms. Although the Powerball technique is orthogonal to the existing accelerated techniques (e.g., the learning rate adjustment strategy) for SO algorithms, the current PSO algorithms take a nearly similar algorithm framework to SO algorithms, where the direct negative result for PSO algorithms is making them inherit low-convergence rate and unstable performance from SO for practical problems. Inspired by this gap, this work develops a novel class of PSO algorithms from the perspective of biased stochastic gradient estimation (BSGE). Specifically, we first explore the theoretical property and the empirical characteristic of vanilla-powered stochastic gradient descent (P-SGD) with BSGE. Second, to further demonstrate the positive impact of BSGE in enhancing the P-SGD type algorithm, we investigate the feature of theory and experiment of P-SGD with momentum under BSGE, where we particularly focus on the effect of negative momentum in P-SGD that is less studied in PSO. Particularly, we prove that the overall complexity of the resulting algorithms matches that of advanced SO algorithms. Finally, large numbers of numerical experiments on benchmark datasets confirm the successful reformation of BSGE in perfecting PSO. This work provides comprehension of the role of BSGE in PSO algorithms, extending the family of PSO algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning
    Yang, Zhuang
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (06) : 1598 - 1606
  • [2] Large-scale machine learning with fast and stable stochastic conjugate gradient
    Yang, Zhuang
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 173
  • [3] Value function gradient learning for large-scale multistage stochastic programming problems
    Lee, Jinkyu
    Bae, Sanghyeon
    Kim, Woo Chang
    Lee, Yongjae
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 308 (01) : 321 - 335
  • [4] SAAGs: Biased stochastic variance reduction methods for large-scale learning
    Chauhan, Vinod Kumar
    Sharma, Anuj
    Dahiya, Kalpana
    APPLIED INTELLIGENCE, 2019, 49 (09) : 3331 - 3361
  • [5] Controllability Maximization of Large-Scale Systems Using Projected Gradient Method
    Sato, Kazuhiro
    Takeda, Akiko
    IEEE CONTROL SYSTEMS LETTERS, 2020, 4 (04): : 821 - 826
  • [6] MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING
    Wiesler, Simon
    Richard, Alexander
    Schlueter, Ralf
    Ney, Hermann
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] A Stochastic Quasi-Newton Method for Large-Scale Nonconvex Optimization With Applications
    Chen, Huiming
    Wu, Ho-Chun
    Chan, Shing-Chow
    Lam, Wong-Hing
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4776 - 4790
  • [8] On Biased Stochastic Gradient Estimation
    Driggs, Derek
    Liang, Jingwei
    Schonlieb, Carola-Bibiane
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [9] A large-scale stochastic gradient descent algorithm over a graphon
    Chen, Yan
    Li, Tao
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4806 - 4811
  • [10] Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning
    Liu, Yuanyuan
    Shang, Fanhua
    Liu, Hongying
    Kong, Lin
    Jiao, Licheng
    Lin, Zhouchen
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4242 - 4255