Sparse Learning with Stochastic Composite Optimization

被引：16

作者：

Zhang, Weizhong ^{[1
]}

Zhang, Lijun ^{[2
]}

Jin, Zhongming ^{[1
]}

Jin, Rong ^{[3
]}

Cai, Deng ^{[1
]}

Li, Xuelong ^{[4
]}

Liang, Ronghua ^{[5
]}

He, Xiaofei ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci, State Key Lab CAD&CG, 388 Yuhang Tang Rd, Hangzhou 310058, Zhejiang, Peoples R China

[2] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China

[3] Alibaba Grp, Seattle, WA 98057 USA

[4] Chinese Acad Sci, State Key Lab Transicent Opt & Photon, Xian Inst Opt & Precis Mech, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710119, Shaanxi, Peoples R China

[5] Zhejiang Univ Technol, Coll Informat Engn, 288 Liuhe Rd, Hangzhou 310058, Zhejiang, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2017年 / 39卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Sparse learning; stochastic optimization; stochastic composite optimization; ONLINE; ALGORITHMS;

D O I：

10.1109/TPAMI.2016.2578323

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we study Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution from a composite function. Most of the recent SCO algorithms have already reached the optimal expected convergence rate O(1/lambda T), but they often fail to deliver sparse solutions at the end either due to the limited sparsity regularization during stochastic optimization (SO) or due to the limitation in online-to-batch conversion. Even when the objective function is strongly convex, their high probability bounds can only attain O(root log(1/delta)/T with delta is the failure probability, which is much worse than the expected convergence rate. To address these limitations, we propose a simple yet effective two-phase Stochastic Composite Optimization scheme by adding a novel powerful sparse online-to-batch conversion to the general Stochastic Optimization algorithms. We further develop three concrete algorithms, OptimalSL, LastSL and AverageSL, directly under our scheme to prove the effectiveness of the proposed scheme. Both the theoretical analysis and the experiment results show that our methods can really outperform the existing methods at the ability of sparse learning and at the meantime we can improve the high probability bound to approximately O(log (log (T)/delta)/lambda T).

引用

页码：1223 / 1236

页数：14

共 50 条

[31] A TWO-TIME-SCALE STOCHASTIC OPTIMIZATION FRAMEWORK WITH APPLICATIONS IN CONTROL AND REINFORCEMENT LEARNING [J].

Zeng, Sihan ;

Doan, Thinh T. ;

Romberg, Justin .

SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (01) :946-976

[32] SGD_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition [J].

Li, Hao ;

Li, Zixuan ;

Li, Kenli ;

Rellermeyer, Jan S. ;

Chen, Lydia Y. ;

Li, Keqin .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (07) :1828-1841

[33] MEDICAL IMAGE RECONSTRUCTION FROM SPARSE SAMPLES USING SIMULTANEOUS PERTURBATION STOCHASTIC OPTIMIZATION [J].

Venkatesh, Y. V. ;

Kassim, Ashraf A. ;

Zonoobi, Dornoosh .

2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, :3369-3372

[34] Centralized and Distributed Online Learning for Sparse Time-Varying Optimization [J].

Fosson, Sophie M. .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (06) :2542-2557

[35] Sparse learning of maximum likelihood model for optimization of complex loss function [J].

Ning Zhang ;

Prathamesh Chandrasekar .

Neural Computing and Applications, 2017, 28 :1057-1067

[36] Sparse learning of maximum likelihood model for optimization of complex loss function [J].

Zhang, Ning ;

Chandrasekar, Prathamesh .

NEURAL COMPUTING & APPLICATIONS, 2017, 28 (05) :1057-1067

[37] Alternating Optimization of Decision Trees, with Application to Learning Sparse Oblique Trees [J].

Carreira-Perpinan, Miguel A. ;

Tavallali, Pooya .

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

[38] Reweighted stochastic learning [J].

Jumutc, Vilen ;

Suykens, Johan A. K. .

NEUROCOMPUTING, 2016, 198 :135-147

[39] Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization [J].

Ahmed Khaled ;

Othmane Sebbouh ;

Nicolas Loizou ;

Robert M. Gower ;

Peter Richtárik .

Journal of Optimization Theory and Applications, 2023, 199 (2) :499-540

[40] Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization [J].

Khaled, Ahmed ;

Sebbouh, Othmane ;

Loizou, Nicolas ;

Gower, Robert M. ;

Richtarik, Peter .

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 199 (02) :499-540

← 1 2 3 4 5 →