Sparse Learning with Stochastic Composite Optimization

被引:16
作者
Zhang, Weizhong [1 ]
Zhang, Lijun [2 ]
Jin, Zhongming [1 ]
Jin, Rong [3 ]
Cai, Deng [1 ]
Li, Xuelong [4 ]
Liang, Ronghua [5 ]
He, Xiaofei [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, State Key Lab CAD&CG, 388 Yuhang Tang Rd, Hangzhou 310058, Zhejiang, Peoples R China
[2] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[3] Alibaba Grp, Seattle, WA 98057 USA
[4] Chinese Acad Sci, State Key Lab Transicent Opt & Photon, Xian Inst Opt & Precis Mech, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710119, Shaanxi, Peoples R China
[5] Zhejiang Univ Technol, Coll Informat Engn, 288 Liuhe Rd, Hangzhou 310058, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Sparse learning; stochastic optimization; stochastic composite optimization; ONLINE; ALGORITHMS;
D O I
10.1109/TPAMI.2016.2578323
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution from a composite function. Most of the recent SCO algorithms have already reached the optimal expected convergence rate O(1/lambda T), but they often fail to deliver sparse solutions at the end either due to the limited sparsity regularization during stochastic optimization (SO) or due to the limitation in online-to-batch conversion. Even when the objective function is strongly convex, their high probability bounds can only attain O(root log(1/delta)/T with delta is the failure probability, which is much worse than the expected convergence rate. To address these limitations, we propose a simple yet effective two-phase Stochastic Composite Optimization scheme by adding a novel powerful sparse online-to-batch conversion to the general Stochastic Optimization algorithms. We further develop three concrete algorithms, OptimalSL, LastSL and AverageSL, directly under our scheme to prove the effectiveness of the proposed scheme. Both the theoretical analysis and the experiment results show that our methods can really outperform the existing methods at the ability of sparse learning and at the meantime we can improve the high probability bound to approximately O(log (log (T)/delta)/lambda T).
引用
收藏
页码:1223 / 1236
页数:14
相关论文
共 50 条
[21]   Powered stochastic optimization with hypergradient descent for large-scale learning systems [J].
Yang, Zhuang ;
Li, Xiaotian .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[22]   Optimal large-scale stochastic optimization of NDCG surrogates for deep learning [J].
Qiu, Zi-Hao ;
Hu, Quanqi ;
Zhong, Yongjian ;
Tu, Wei-Wei ;
Zhang, Lijun ;
Yang, Tianbao .
MACHINE LEARNING, 2025, 114 (02)
[23]   STOCHASTIC LEARNING APPROACH FOR BINARY OPTIMIZATION: APPLICATION TO BAYESIAN OPTIMAL DESIGN OF EXPERIMENTS* [J].
ATTIA, A. H. M. E. D. ;
LEYFFER, S. V. E. N. ;
MUNSON, T. O. D. D. S. .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2022, 44 (02) :B395-B427
[24]   Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives [J].
Dedieu, Antoine ;
Hazimeh, Hussein ;
Mazumder, Rahul .
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
[25]   Stochastic subgradient projection methods for composite optimization with functional constraints [J].
Necoara, Ion ;
Singh, Nitesh Kumar .
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[26]   Stochastic Primal Dual Fixed Point Method for Composite Optimization [J].
Zhu, Ya-Nan ;
Zhang, Xiaoqun .
JOURNAL OF SCIENTIFIC COMPUTING, 2020, 84 (01)
[27]   Conditional gradient type methods for composite nonlinear and stochastic optimization [J].
Ghadimi, Saeed .
MATHEMATICAL PROGRAMMING, 2019, 173 (1-2) :431-464
[28]   General Framework for Parameter Learning and Optimization in Stochastic Environments [J].
Jiang, Wen ;
Yan, Yan ;
Ge, Hao ;
Li, Shenghong .
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2016, 386 :1033-1039
[29]   Semi-Bandit Learning for Monotone Stochastic Optimization [J].
Agarwal, Arpit ;
Ghuge, Rohan ;
Nagarajan, Viswanath .
2024 IEEE 65TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, FOCS 2024, 2024, :1260-1274
[30]   Distributed Stochastic Optimization via Matrix Exponential Learning [J].
Mertikopoulos, Panayotis ;
Belmega, E. Veronica ;
Negrel, Romain ;
Sanguinetti, Luca .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (09) :2277-2290