Asynchronous parallel stochastic Quasi-Newton methods

被引：4

作者：

Tong, Qianqian ^{[1
]}

Liang, Guannan ^{[1
]}

Cai, Xingyu ^{[2
]}

Zhu, Chunjiang ^{[1
]}

Bi, Jinbo ^{[1
]}

机构：

[1] Univ Connecticut, Storrs, CT 06269 USA

[2] Baidu USA, Sunnyvale, CA 94089 USA

来源：

PARALLEL COMPUTING | 2021年 / 101卷

关键词：

Quasi-Newton method; Asynchronous parallel; Stochastic algorithm; Variance reduction; SUPERLINEAR CONVERGENCE; DESCENT;

D O I：

10.1016/j.parco.2020.102721

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Although first-order stochastic algorithms, such as stochastic gradient descent, have been the main force to scale up machine learning models, such as deep neural nets, the second-order quasi-Newton methods start to draw attention due to their effectiveness in dealing with ill-conditioned optimization problems. The L-BFGS method is one of the most widely used quasi-Newton methods. We propose an asynchronous parallel algorithm for stochastic quasi-Newton (AsySQN) method. Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee. Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate. We prove that our asynchronous parallel scheme maintains the same linear convergence rate but achieves significant speedup. Empirical evaluations in both simulations and benchmark datasets demonstrate the speedup in comparison with the non-parallel stochastic L-BFGS, as well as the better performance than first-order methods in solving ill-conditioned problems.

引用

页数：12

共 41 条

[1]

[Anonymous], 2015, PROC NEURAL INF PROC

[2]

[Anonymous], 2011, ADV NEURAL INFORM PR

[3]

[Anonymous], 2015, ADV NEURAL INFORM PR

[4]

[Anonymous], 2014, P ADV NEUR INF PROC

[5]

[Anonymous], 2018, ARXIV180205374

[6]

Berahas AS, 2016, ADV NEUR IN, V29

[7]

Bordes A, 2009, J MACH LEARN RES, V10, P1737

[8] Optimization Methods for Large-Scale Machine Learning [J].

Bottou, Leon ;

Curtis, Frank E. ;

Nocedal, Jorge .

SIAM REVIEW, 2018, 60 (02) :223-311

[9] A STOCHASTIC QUASI-NEWTON METHOD FOR LARGE-SCALE OPTIMIZATION [J].

Byrd, R. H. ;

Hansen, S. L. ;

Nocedal, Jorge ;

Singer, Y. .

SIAM JOURNAL ON OPTIMIZATION, 2016, 26 (02) :1008-1031

[10]

Defazio A, 2014, ADV NEUR IN, V27

← 1 2 3 4 5 →