Distributed and asynchronous Stochastic Gradient Descent with variance reduction

被引：19

作者：

Ming, Yuewei ^{[1
]}

Zhao, Yawei ^{[1
]}

Wu, Chengkun ^{[1
]}

Li, Kuan ^{[1
]}

Yin, Jianping ^{[2
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China

[2] Dongguan Univ Technol, Dongguan 523808, Peoples R China

来源：

NEUROCOMPUTING | 2018年 / 281卷

基金：

中国国家自然科学基金;

关键词：

Stochastic Gradient Descent; Variance reduction; Asynchronous communication protocol; Distributed machine learning algorithms;

D O I：

10.1016/j.neucom.2017.11.044

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Stochastic Gradient Descent (SGD) with variance reduction techniques has been proved powerful to train the parameters of various machine learning models. However, it cannot support the distributed systems trivially due to the intrinsic design. Although conventional studies such as PetuumSGD perform well for distributed machine learning tasks, they mainly focus on the optimization of the communication protocol, which does not exploit the potential benefits of a specific machine learning algorithm. In this paper, we analyze the asynchronous communication protocol in PetuumSGD, and propose a distributed version of variance reduced SGD named DisSVRG. DisSVRG adopts the variance reduction technique to update the parameters in a model. After that, those newly learned parameters across nodes are shared by using the asynchronous communication protocol. Besides, we accelerate DisSVRG by using the adaptive learning rate with an acceleration factor. Additionally, an adaptive sampling strategy is proposed in DisSVRG. The proposed methods greatly reduce the wait time during the iterations, and accelerate the convergence of DisSVRG significantly. Extensive empirical studies verify that DisSVRG converges faster than the state-of-the-art variants of SGD, and gains almost linear speedup in a cluster. (c) 2017 Elsevier B.V. All rights reserved.

引用

页码：27 / 36

页数：10

共 28 条

[1] [Anonymous], 2013, Adv. Neural Inf. Process. Syst.
[2] [Anonymous], 2017, Frontiers in Applied Mathematics and Statistics
[3] [Anonymous], 2011, INT C ART INT STAT
[4] [Anonymous], 2014, OPERATING SYSTEMS DE
[5] [Anonymous], P INT C MACH LEARN
[6] [Anonymous], P 29 AAAI C ART INT
[7] [Anonymous], ARXIV150706970
[8] An Adaptive Projected Subgradient Approach to Learning in Diffusion Networks
Cavalcante, Renato L. G.
Yamada, Isao
Mulgrew, Bernard
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (07) : 2762 - 2774
[9] Collobert R., 2008, P 25 INT C MACH LEAR, P160
[10] Cucker F, 2002, B AM MATH SOC, V39, P1

← 1 2 3 →