Markov chain block coordinate descent

被引:0
作者
Tao Sun
Yuejiao Sun
Yangyang Xu
Wotao Yin
机构
[1] National University of Defense Technology,National Lab for Parallel and Distributed Processing, College of Computer
[2] University of California,Department of Mathematics
[3] Rensselaer Polytechnic Institute,Department of Mathematical Sciences
来源
Computational Optimization and Applications | 2020年 / 75卷
关键词
Block coordinate gradient descent; Markov chain; Markov chain Monte Carlo; Markov decision process; Decentralized optimization; Primary 90C26; Secondary 90C40; 68W15;
D O I
暂无
中图分类号
学科分类号
摘要
The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block selection is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications.
引用
收藏
页码:35 / 61
页数:26
相关论文
共 51 条
[11]  
Yin W(2016)Efficiency of coordinate descent methods on huge-scale optimization problems Ann. Math. Sci. Appl. 1 57-119
[12]  
Dang C(2016)Coordinate friendly structures, algorithms and applications SIAM J. Sci. Comput. 38 A2851-A2879
[13]  
Lan G(2019)ARock: an algorithmic framework for asynchronous parallel coordinate updates J. Oper. Res. Soc. China 1 5-42
[14]  
Fercoq O(2009)On the convergence of asynchronous parallel iteration with arbitrary delays SIAM J. Optim. 20 691-717
[15]  
Richtárik P(2016)Incremental stochastic subgradient algorithms for convex optimization Math. Program. 156 433-484
[16]  
Li Y(2011)Parallel coordinate descent methods for big data optimization J. Mach. Learn. Res. 12 1865-1892
[17]  
Osher S(2013)Stochastic methods for l1-regularized loss minimization J. Mach. Learn. Res. 14 567-599
[18]  
Li Z(2009)Stochastic dual coordinate ascent methods for regularized loss minimization Math. Program. 117 387-423
[19]  
Uschmajew A(2013)A coordinate gradient descent method for nonsmooth separable minimization SIAM J. Imaging Sci. 6 1758-1789
[20]  
Zhang S(2015)A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion SIAM J. Optim. 25 1686-1716