Adaptive Biased Stochastic Optimization

被引：0

作者：

Yang, Zhuang ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2025年 / 47卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Stochastic processes; Optimization; Radio frequency; Convergence; Machine learning algorithms; Machine learning; Complexity theory; Numerical models; Adaptation models; Support vector machines; Stochastic optimization; biased gradient estimation; convergence analysis; numerical stability; adaptivity; CONJUGATE-GRADIENT METHOD; DESCENT;

D O I：

10.1109/TPAMI.2025.3528193

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work develops and analyzes a class of adaptive biased stochastic optimization (ABSO) algorithms from the perspective of the GEneralized Adaptive gRadient (GEAR) method that contains Adam, AdaGrad, RMSProp, etc. Particularly, two preferred biased stochastic optimization (BSO) algorithms, the biased stochastic variance reduction gradient (BSVRG) algorithm and the stochastic recursive gradient algorithm (SARAH), equipped with GEAR, are first considered in this work, leading to two ABSO algorithms: BSVRG-GEAR and SARAH-GEAR. We present a uniform analysis of ABSO algorithms for minimizing strongly convex (SC) and Polyak-& Lstrok;ojasiewicz (P & Lstrok;) composite objective functions. Second, we also use our framework to develop another novel BSO algorithm, adaptive biased stochastic conjugate gradient (coined BSCG-GEAR), which achieves the well-known oracle complexity. Specifically, under mild conditions, we prove that the resulting ABSO algorithms attain a linear convergence rate on both P & Lstrok; and SC cases. Moreover, we show that the complexity of the resulting ABSO algorithms is comparable to that of advanced stochastic gradient-based algorithms. Finally, we demonstrate the empirical superiority and the numerical stability of the resulting ABSO algorithms by conducting numerical experiments on different applications of machine learning.

引用

页码：3067 / 3078

页数：12

共 50 条

[41] S-DIGing: A Stochastic Gradient Tracking Algorithm for Distributed Optimization
Li, Huaqing
Zheng, Lifeng
Wang, Zheng
Yan, Yu
Feng, Liping
Guo, Jing
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (01): : 53 - 65
[42] Adaptive step size rules for stochastic optimization in large-scale learning
Yang, Zhuang
Ma, Li
STATISTICS AND COMPUTING, 2023, 33 (02)
[43] Decentralized Stochastic Optimization With Random Attendance
Tran Thi Phuong
Le Trieu Phong
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1322 - 1326
[44] VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning
Shang, Fanhua
Zhou, Kaiwen
Liu, Hongying
Cheng, James
Tsang, Ivor W.
Zhang, Lijun
Tao, Dacheng
Jiao, Licheng
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (01) : 188 - 202
[45] A single timescale stochastic quasi-Newton method for stochastic optimization
Wang, Peng
Zhu, Detong
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2023, 100 (12) : 2196 - 2216
[46] Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Duchi, John
Hazan, Elad
Singer, Yoram
JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 2121 - 2159
[47] Adaptive Stochastic Optimization to Improve Protein Conformation Sampling
Zaman, Ahmed Bin
Inan, Toki Tahmid
De Jong, Kenneth
Shehu, Amarda
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) : 2759 - 2771
[48] Stochastic optimization of adaptive enrichment designs for two subpopulations
Fisher, Aaron
Rosenblum, Michael
JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2018, 28 (05) : 966 - 982
[49] Mini-Batch Stochastic Three-Operator Splitting for Distributed Optimization
Franci, Barbara
Staudigl, Mathias
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2882 - 2887
[50] Randomized Block Proximal Methods for Distributed Stochastic Big-Data Optimization
Farina, Francesco
Notarstefano, Giuseppe
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (09) : 4000 - 4014

← 1 2 3 4 5 →