Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning

被引：21

作者：

Liu, Yuanyuan ^{[1
]}

Shang, Fanhua ^{[1
,2
]}

Liu, Hongying ^{[1
]}

Kong, Lin ^{[1
]}

Jiao, Licheng ^{[1
]}

Lin, Zhouchen ^{[3
]}

机构：

[1] Xidian Univ, Sch Artificial Intelligence, Minist Educ, Key Lab Intelligent Percept & Image Understanding, Xian 710071, Shaanxi, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China

[3] Peking Univ, Sch EECS, Key Lab Machine Percept MoE, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2021年 / 43卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Stochastic optimization; ADMM; variance reduction; momentum acceleration; strongly convex and non-strongly convex; smooth and non-smooth; ALTERNATING DIRECTION METHOD; DUAL COORDINATE ASCENT; CONVERGENCE; MULTIPLIERS; ALGORITHMS; DESCENT;

D O I：

10.1109/TPAMI.2020.3000512

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, many stochastic variance reduced alternating direction methods of multipliers (ADMMs) (e.g., SAG-ADMM and SVRG-ADMM) have made exciting progress such as linear convergence rate for strongly convex (SC) problems. However, their best-known convergence rate for non-strongly convex (non-SC) problems is O(1/T) as opposed to O(1/T-2) of accelerated deterministic algorithms, where T is the number of iterations. Thus, there remains a gap in the convergence rates of existing stochastic ADMM and deterministic algorithms. To bridge this gap, we introduce a new momentum acceleration trick into stochastic variance reduced ADMM, and propose a novel accelerated SVRG-ADMM method (called ASVRG-ADMM) for the machine learning problems with the constraint Ax + By = c. Then we design a linearized proximal update rule and a simple proximal one for the two classes of ADMM-style problems with B = tau I and B not equal tau I, respectively, where I is an identity matrix and tau is an arbitrary bounded constant. Note that our linearized proximal update rule can avoid solving sub-problems iteratively. Moreover, we prove that ASVRG-ADMM converges linearly for SC problems. In particular, ASVRG-ADMM improves the convergence rate from O(1/T) to O(1/T-2) for non-SC problems. Finally, we apply ASVRG-ADMM to various machine learning problems, e.g., graph-guided fused Lasso, graph-guided logistic regression, graph-guided SVM, generalized graph-guided fused Lasso and multi-task learning, and show that ASVRG-ADMM consistently converges faster than the state-of-the-art methods.

引用

页码：4242 / 4255

页数：14

共 50 条

[41] Adaptive step size rules for stochastic optimization in large-scale learning
Zhuang Yang
Li Ma
Statistics and Computing, 2023, 33
[42] ASPDC: Accelerated SPDC Regularized Empirical Risk Minimization for Ill-Conditioned Problems in Large-Scale Machine Learning
Liang, Haobang
Cai, Hao
Wu, Hejun
Shang, Fanhua
Cheng, James
Li, Xiying
ELECTRONICS, 2022, 11 (15)
[43] Online Budgeted Stochastic Coordinate Ascent for Large-Scale Kernelized Dual Support Vector Machine Training
Qaadan, Sahar
Pendyala, Abhijeet
Schueler, Merlin
Glasmachers, Tobias
PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM 2019), 2020, 11996 : 23 - 47
[44] Stochastic distributed learning with gradient quantization and double-variance reduction
Horvath, Samuel
Kovalev, Dmitry
Mishchenko, Konstantin
Richtarik, Peter
Stich, Sebastian
OPTIMIZATION METHODS & SOFTWARE, 2023, 38 (01) : 91 - 106
[45] Optimized Scenario Reduction: Solving Large-Scale Stochastic Programs with Quality Guarantees
Zhang, Wei
Wang, Kai
Jacquillat, Alexandre
Wang, Shuaian
INFORMS JOURNAL ON COMPUTING, 2023, 35 (04) : 886 - 908
[46] Value function gradient learning for large-scale multistage stochastic programming problems
Lee, Jinkyu
Bae, Sanghyeon
Kim, Woo Chang
Lee, Yongjae
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 308 (01) : 321 - 335
[47] Randomized Smoothing Variance Reduction Method for Large-Scale Non-smooth Convex Optimization
Huang W.
Zhang X.
Operations Research Forum, 2 (2)
[48] Stochastic Gradients for Large-Scale Tensor Decomposition
Kolda, Tamara G.
Hong, David
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2020, 2 (04): : 1066 - 1095
[49] INCREMENTAL MAJORIZATION-MINIMIZATION OPTIMIZATION WITH APPLICATION TO LARGE-SCALE MACHINE LEARNING
Mairal, Julien
SIAM JOURNAL ON OPTIMIZATION, 2015, 25 (02) : 829 - 855
[50] YISHAN: Managing Large-scale Cloud Database Instances via Machine Learning
Xiao, Wenhua
Yang, Cheng
Wang, Ji
Zhu, Xiaomin
Bao, Weidong
Feng, Xiaojie
Xie, Yu
Cao, Wei
Yu, Feng
Liu, Ling
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (01) : 724 - 738

← 1 2 3 4 5 →