RES: Regularized Stochastic BFGS Algorithm

被引:121
作者
Mokhtari, Aryan [1 ]
Ribeiro, Alejandro [1 ]
机构
[1] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
Quasi-Newton methods; large-scale optimization; stochastic optimization; support vector machines; APPROXIMATION; CONVERGENCE;
D O I
10.1109/TSP.2014.2357775
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
RES, a regularized stochastic version of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method, is proposed to solve strongly convex optimization problems with stochastic objectives. The use of stochastic gradient descent algorithms is widespread, but the number of iterations required to approximate optimal arguments can be prohibitive in high dimensional problems. Application of second-order methods, on the other hand, is impracticable because the computation of objective function Hessian inverses incurs excessive computational cost. BFGS modifies gradient descent by introducing a Hessian approximation matrix computed from finite gradient differences. RES utilizes stochastic gradients in lieu of deterministic gradients for both the determination of descent directions and the approximation of the objective function's curvature. Since stochastic gradients can be computed at manageable computational cost, RES is realizable and retains the convergence rate advantages of its deterministic counterparts. Convergence results show that lower and upper bounds on the Hessian eigenvalues of the sample functions are sufficient to guarantee almost sure convergence of a subsequence generated by RES and convergence of the sequence in expectation to optimal arguments. Numerical experiments showcase reductions in convergence time relative to stochastic gradient descent algorithms and non-regularized stochastic versions of BFGS. An application of RES to the implementation of support vector machines is developed.
引用
收藏
页码:6089 / 6104
页数:16
相关论文
共 31 条
[1]  
[Anonymous], SOME GLOBAL CONVERGE
[2]  
[Anonymous], 2008, P 25 INT C MACHINE L, DOI [10.1145/1390156.1390273, DOI 10.1145/1390156.1390273]
[3]  
[Anonymous], 2012, Advances in Neural Information Processing System
[4]  
[Anonymous], 2013, ADV NEURAL INFORM PR
[5]  
Birge J.R., 1995, A stochastic newton method for stochastic quadratic programs with resource
[6]  
Bordes A, 2009, J MACH LEARN RES, V10, P1737
[7]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[8]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[9]   On-line learning for very large data sets [J].
Bottou, U ;
Le Cun, Y .
APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2005, 21 (02) :137-151
[10]  
Boyd S.P, 2004, Convex optimization, DOI [DOI 10.1017/CBO9780511804441, 10.1017/CBO9780511804441]