Intermittent Pulling With Local Compensation for Communication-Efficient Distributed Learning

被引：3

作者：

Wang, Haozhao ^{[1
,2
]}

Qu, Zhihao ^{[1
,2
]}

Guo, Song ^{[2
]}

Gao, Xin ^{[1
]}

Li, Ruixuan ^{[1
]}

Ye, Baoliu ^{[3
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China

[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China

[3] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210093, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2022年 / 10卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Computational modeling; Servers; Convergence; Radio frequency; Training; Stochastic processes; Machine learning; Distributed machine learning; parameter server; communication compression; local compensation; PARADIGM;

D O I：

10.1109/TETC.2020.3043300

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As a widely used iterative algorithm, the distributed Stochastic Gradient Descent (SGD) has shown great advances in training machine learning models due to the reduced time of the gradients computation. However, the huge number of iterations of SGD usually incurs huge communication cost on pushing local gradients and pulling global model that prohibits its further improvement over performance. In this article, to reduce the number of pulling operations, a novel approach named Pulling Reduction with Local Compensation (PRLC) is proposed, in which each worker intermittently pulls the global model from the server and uses its local update to compensate the gap between the local model and the global model. Our rigorous theoretical analysis shows that the convergence rate of PRLC preserves the same order as the classical synchronous SGD for both strongly-convex and non-convex cases with good scalability due to the linear speedup with respect to the number of training nodes. Moreover, we also show that PRLC admits lower pulling frequency than the pulling reduction method without local compensation. The extensive experiments conducted on various models exhibit that our approach achieves a significant pulling reduction over the state-of-the-art methods, e.g., requiring only half of the pulling operations of LAG.

引用

页码：779 / 791

页数：13

共 41 条

[1]

Alistarh D, 2018, ADV NEUR IN, V31

[2]

Alistarh D, 2017, ADV NEUR IN, V30

[3]

[Anonymous], 2006, LINUX TRAFFIC CONTRO

[4]

[Anonymous], 2018, P ICAIS

[5] Optimization Methods for Large-Scale Machine Learning [J].

Bottou, Leon ;

Curtis, Frank E. ;

Nocedal, Jorge .

SIAM REVIEW, 2018, 60 (02) :223-311

[6] Applying machine learning and parallel data processing for attack detection in IoT [J].

Branitskiy, Alexander ;

Kotenko, Igor ;

Saenko, Igor .

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (04) :1642-1653

[7]

Chen CY, 2018, AAAI CONF ARTIF INTE, P2827

[8] Multi-objective genetic algorithm for energy-efficient hybrid flow shop scheduling with lot streaming [J].

Chen, Tzu-Li ;

Cheng, Chen-Yang ;

Chou, Yi-Han .

ANNALS OF OPERATIONS RESEARCH, 2020, 290 (1-2) :813-836

[9] Cy-mag3D: a simple and miniature climbing robot with advance mobility in ferromagnetic environment [J].

Rochat, Frederic ;

Schoeneich, Patrick ;

Luethi, Barthelemy ;

Bleuler, Hannes ;

Moser, Roland ;

Mondada, Francesco .

INDUSTRIAL ROBOT-AN INTERNATIONAL JOURNAL, 2011, 38 (03) :229-233

[10]

Dean J., 2012, P 25 INT C NEUR INF, P1223

← 1 2 3 4 5 →