Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning

被引:21
作者
Liu, Yuanyuan [1 ]
Shang, Fanhua [1 ,2 ]
Liu, Hongying [1 ]
Kong, Lin [1 ]
Jiao, Licheng [1 ]
Lin, Zhouchen [3 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Minist Educ, Key Lab Intelligent Percept & Image Understanding, Xian 710071, Shaanxi, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China
[3] Peking Univ, Sch EECS, Key Lab Machine Percept MoE, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Stochastic optimization; ADMM; variance reduction; momentum acceleration; strongly convex and non-strongly convex; smooth and non-smooth; ALTERNATING DIRECTION METHOD; DUAL COORDINATE ASCENT; CONVERGENCE; MULTIPLIERS; ALGORITHMS; DESCENT;
D O I
10.1109/TPAMI.2020.3000512
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, many stochastic variance reduced alternating direction methods of multipliers (ADMMs) (e.g., SAG-ADMM and SVRG-ADMM) have made exciting progress such as linear convergence rate for strongly convex (SC) problems. However, their best-known convergence rate for non-strongly convex (non-SC) problems is O(1/T) as opposed to O(1/T-2) of accelerated deterministic algorithms, where T is the number of iterations. Thus, there remains a gap in the convergence rates of existing stochastic ADMM and deterministic algorithms. To bridge this gap, we introduce a new momentum acceleration trick into stochastic variance reduced ADMM, and propose a novel accelerated SVRG-ADMM method (called ASVRG-ADMM) for the machine learning problems with the constraint Ax + By = c. Then we design a linearized proximal update rule and a simple proximal one for the two classes of ADMM-style problems with B = tau I and B not equal tau I, respectively, where I is an identity matrix and tau is an arbitrary bounded constant. Note that our linearized proximal update rule can avoid solving sub-problems iteratively. Moreover, we prove that ASVRG-ADMM converges linearly for SC problems. In particular, ASVRG-ADMM improves the convergence rate from O(1/T) to O(1/T-2) for non-SC problems. Finally, we apply ASVRG-ADMM to various machine learning problems, e.g., graph-guided fused Lasso, graph-guided logistic regression, graph-guided SVM, generalized graph-guided fused Lasso and multi-task learning, and show that ASVRG-ADMM consistently converges faster than the state-of-the-art methods.
引用
收藏
页码:4242 / 4255
页数:14
相关论文
共 50 条
  • [41] Adaptive step size rules for stochastic optimization in large-scale learning
    Zhuang Yang
    Li Ma
    Statistics and Computing, 2023, 33
  • [42] ASPDC: Accelerated SPDC Regularized Empirical Risk Minimization for Ill-Conditioned Problems in Large-Scale Machine Learning
    Liang, Haobang
    Cai, Hao
    Wu, Hejun
    Shang, Fanhua
    Cheng, James
    Li, Xiying
    ELECTRONICS, 2022, 11 (15)
  • [43] Online Budgeted Stochastic Coordinate Ascent for Large-Scale Kernelized Dual Support Vector Machine Training
    Qaadan, Sahar
    Pendyala, Abhijeet
    Schueler, Merlin
    Glasmachers, Tobias
    PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM 2019), 2020, 11996 : 23 - 47
  • [44] Stochastic distributed learning with gradient quantization and double-variance reduction
    Horvath, Samuel
    Kovalev, Dmitry
    Mishchenko, Konstantin
    Richtarik, Peter
    Stich, Sebastian
    OPTIMIZATION METHODS & SOFTWARE, 2023, 38 (01) : 91 - 106
  • [45] Optimized Scenario Reduction: Solving Large-Scale Stochastic Programs with Quality Guarantees
    Zhang, Wei
    Wang, Kai
    Jacquillat, Alexandre
    Wang, Shuaian
    INFORMS JOURNAL ON COMPUTING, 2023, 35 (04) : 886 - 908
  • [46] Value function gradient learning for large-scale multistage stochastic programming problems
    Lee, Jinkyu
    Bae, Sanghyeon
    Kim, Woo Chang
    Lee, Yongjae
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 308 (01) : 321 - 335
  • [47] Randomized Smoothing Variance Reduction Method for Large-Scale Non-smooth Convex Optimization
    Huang W.
    Zhang X.
    Operations Research Forum, 2 (2)
  • [48] Stochastic Gradients for Large-Scale Tensor Decomposition
    Kolda, Tamara G.
    Hong, David
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2020, 2 (04): : 1066 - 1095
  • [49] INCREMENTAL MAJORIZATION-MINIMIZATION OPTIMIZATION WITH APPLICATION TO LARGE-SCALE MACHINE LEARNING
    Mairal, Julien
    SIAM JOURNAL ON OPTIMIZATION, 2015, 25 (02) : 829 - 855
  • [50] YISHAN: Managing Large-scale Cloud Database Instances via Machine Learning
    Xiao, Wenhua
    Yang, Cheng
    Wang, Ji
    Zhu, Xiaomin
    Bao, Weidong
    Feng, Xiaojie
    Xie, Yu
    Cao, Wei
    Yu, Feng
    Liu, Ling
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (01) : 724 - 738