Powered stochastic optimization with hypergradient descent for large-scale learning systems

被引:1
|
作者
Yang, Zhuang [1 ]
Li, Xiaotian [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
关键词
Powerball function; Stochastic optimization; Variance reduction; Hypergradient descent; Adaptive learning rate; ALGORITHMS; SELECTION;
D O I
10.1016/j.eswa.2023.122017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic optimization (SO) algorithms based on the Powerball function, namely powered stochastic optimization (PoweredSO) algorithms, have been confirmed, effectively, and demonstrated great potential in the context of large-scale optimization and machine learning tasks. Nevertheless, the issue of how to determine the learning rate for PoweredSO is a challenge and still unsolved problem. In this paper, we propose a class of adaptive PoweredSO approaches that are efficient, scalable and robust. It takes advantage of the hypergradient descent (HD) technique to automatically acquire an online learning rate for PoweredSO-like methods. In the first part, we study the behavior of the canonical PoweredSO algorithm, the Powerball stochastic gradient descent (pbSGD) method, with HD. The existing PoweredSO algorithms also suffer from the high variance because they take the similar algorithmic framework to SO algorithms, arising from sampling tactics. Therefore, the second portion develops an adaptive powered variance-reduced optimization method via utilizing both variance-reduced technique and HD. Moreover, we present the convergence analysis of the proposed algorithms and explore their iteration complexity on non-convex cases. Numerical experiments are conducted on machine learning tasks, verifying the superior performance over modern SO algorithms.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning
    Yang, Zhuang
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [2] Optimal large-scale stochastic optimization of NDCG surrogates for deep learning
    Qiu, Zi-Hao
    Hu, Quanqi
    Zhong, Yongjian
    Tu, Wei-Wei
    Zhang, Lijun
    Yang, Tianbao
    MACHINE LEARNING, 2025, 114 (02)
  • [3] Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning
    Yang, Zhuang
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (06) : 1598 - 1606
  • [4] Constrained Stochastic Gradient Descent for Large-scale Least Squares Problem
    Mu, Yang
    Ding, Wei
    Zhou, Tianyi
    Tao, Dacheng
    19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 883 - 891
  • [5] On the flexibility of block coordinate descent for large-scale optimization
    Wang, Xiangfeng
    Zhang, Wenjie
    Yan, Junchi
    Yuan, Xiaoming
    Zha, Hongyuan
    NEUROCOMPUTING, 2018, 272 : 471 - 480
  • [6] Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning
    Liu, Yuanyuan
    Shang, Fanhua
    Liu, Hongying
    Kong, Lin
    Jiao, Licheng
    Lin, Zhouchen
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4242 - 4255
  • [7] Adaptive step size rules for stochastic optimization in large-scale learning
    Zhuang Yang
    Li Ma
    Statistics and Computing, 2023, 33
  • [8] Adaptive step size rules for stochastic optimization in large-scale learning
    Yang, Zhuang
    Ma, Li
    STATISTICS AND COMPUTING, 2023, 33 (02)
  • [9] Inertial accelerated stochastic mirror descent for large-scale generalized tensor CP decomposition
    Liu, Zehui
    Wang, Qingsong
    Cui, Chunfeng
    Xia, Yong
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2025, : 201 - 233
  • [10] Variance Counterbalancing for Stochastic Large-scale Learning
    Lagari, Pola Lydia
    Tsoukalas, Lefteri H.
    Lagaris, Isaac E.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2020, 29 (05)