Accelerated proximal stochastic variance reduction for DC optimization

被引:0
作者
Lulu He
Jimin Ye
Jianwei E
机构
[1] Xi’dian University,School of Mathematics and Statistics
来源
Neural Computing and Applications | 2021年 / 33卷
关键词
Stochastic difference-of-convex programming; Variance reduction; Proximal operator; Kurdyka-Łojasiewicz property;
D O I
暂无
中图分类号
学科分类号
摘要
In this article, we study an important class of stochastic difference-of-convex (SDC) programming whose objective is given in the form of the sum of a continuously differentiable convex function, a simple convex function and a continuous concave function. Recently, a proximal stochastic variance reduction difference-of-convex algorithm (Prox-SVRDCA) (Xu et al., 2019) is developed for this problem. And, Prox-SVRDCA reduces to the proximal stochastic variance reduction gradient (Prox-SVRG) (Xiao and Zhang, 2014) as the continuous concave function is disappeared, and hence Prox-SVRDCA is potentially slow in practice. Inspired by recently proposed acceleration techniques, an accelerated proximal stochastic variance reduction difference-of-convex algorithm (AProx-SVRDCA) is proposed. Different from Prox-SVRDCA, an extrapolation acceleration step that involves the latest two iteration points is incorporated in AProx-SVRDCA. The experimental results show that, for a fairly general choice of the extrapolation parameter, the acceleration will be achieved for AProx-SVRDCA. Then, a rigorous theoretical analysis is presented. We first show that any accumulation point of the generated iteration sequences is a stationary point of the objective function. Furthermore, different from the traditional convergence analysis in the existing nonconvex stochastic optimizations, a global convergence property with respect to the generated sequences is established under the assumption: the Kurdyka-Łojasiewicz property together with the continuity and differentiability of the concave part in objective function. To the best of our knowledge, this is the first time that the acceleration trick is incorporated into nonconvex nonsmooth SDC programming. Finally, extended experimental results show the superiority of our proposed algorithm.
引用
收藏
页码:13163 / 13181
页数:18
相关论文
共 80 条
[1]  
Gasso G(2009)Recovering sparse signals with a certain family of nonconvex penalties and DC programming IEEE Trans Signal Process 57 4686-4698
[2]  
Rakotomamonjy A(2014)Minimization of transformed Math Program 169 1-30
[3]  
Canu S(2015) penalty: theory, difference of convex function algorithm, and robust application in compressed sensing Mach Learn 101 163-186
[4]  
Zhang S(1997)Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm Acta Math Vietnam 22 289-355
[5]  
Xin J(2018)Convex analysis approach to DC programming: theory, algorithms and applications Comput Optimiz Appl 69 297-324
[6]  
Le Thi HA(1998)A proximal difference-of-convex algorithm with extrapolation SIAM J Optimiz 8 476-505
[7]  
Le HM(2017)A D.C. optimization algorithm for solving the trust-region subproblem SIAM J Optimiz 27 1637-1665
[8]  
Pham Dinh T(2018)Difference-of-convex learning: directional stationarity, optimality, and sparsity Math Program 169 5-68
[9]  
Pham Dinh T(2013)Dc programming and DCA: thirty years of developments Adv Neural Inform Process Syst 4 2663-2671
[10]  
Le Thi HA(2014)A stochastic gradient method with an exponential convergence rate for finite training sets Adv Neural Inform Process Syst 2 1646-1654