Self-Adaptive Gradient Quantization for Geo-Distributed Machine Learning Over Heterogeneous and Dynamic Networks

被引：7

作者：

Fan, Chenyu ^{[1
]}

Zhang, Xiaoning ^{[1
]}

Zhao, Yangming ^{[2
,3
]}

Liu, Yutao ^{[1
]}

Yu, Shui ^{[4
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China

[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Peoples R China

[3] Univ Sci & Technol China, Suzhou Inst Adv Res, Suzhou 215123, Peoples R China

[4] Univ Technol Sydney, Sch Comp Sci, Sydney 2007, Australia

来源：

IEEE TRANSACTIONS ON CLOUD COMPUTING | 2023年 / 11卷 / 04期

关键词：

Geo-Distributed machine learning; gradient quantization; resource scheduling; wide area network;

D O I：

10.1109/TCC.2023.3292525

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Geo-Distributed Machine Learning (Geo-DML) has been proposed to collaborate geographically dispersed data centers (DCs) and train large scale machine learning (ML) models for various applications. While Geo-DML can achieve excellent performance, it also injects massive data traffic into the Wide Area Networks (WANs) in order to exchange gradients during model training process. Such a huge amount of traffic will not only incur network congestion and prolong the training procedure, but also result in straggler problem when DCs are working in heterogeneous network environments. To alleviate these problems, we propose Self-Adaptive Gradient Quantization (SAGQ) for Geo-DML in this work. In SAGQ, each worker DC adopts specific quantization method based on the heterogeneous and dynamic link bandwidth in order to reduce the communication overhead and balance the communication time among worker DCs. By doing so, SAGQ will speed up the Geo-DML training process without sacrificing the ML model performance. Extensive experiments show that compared with the state-of-the-art techniques, SAGQ reduces the Wall-clock time spent to train an ML model by 1.13x-21.31x. In addition, SAGQ can also improve the model accuracy by 0.11%-2.27% over baselines.

引用

页码：3483 / 3496

页数：14

共 38 条

[1]

Aji AlhamFikri., 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, P440, DOI [DOI 10.18653/V1/D17-1045, 10.18653/v1/D17-1045]

[2] WeightGrad: Geo-Distributed Data Analysis Using Quantization for Faster Convergence and Better Accuracy [J].

Akter, Syeda Nahida ;

Adnan, Muhammad Abdullah .

KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :546-556

[3]

Alistarh D, 2017, ADV NEUR IN, V30

[4] Convex Optimization: Algorithms and Complexity [J].

不详 .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2015, 8 (3-4) :232-+

[5]

Cao X., 2022, MLclf: The project machine learning CLassiFication for utilizing mini-ImageNet and tiny-ImageNet

[6] Measuring bottleneck link speed in packet-switched networks [J].

Carter, RL ;

Crovella, ME .

PERFORMANCE EVALUATION, 1996, 27-8 :297-318

[7] MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server [J].

Cui, Guoxin ;

Xu, Jun ;

Zeng, Wei ;

Lan, Yanyan ;

Guo, Jiafeng ;

Cheng, Xueqi .

PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, :83-90

[8]

Dryden N, 2016, PROCEEDINGS OF 2016 2ND WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC), P1, DOI [10.1109/MLHPC.2016.4, 10.1109/MLHPC.2016.004]

[9] STOCHASTIC FIRST- AND ZEROTH-ORDER METHODS FOR NONCONVEX STOCHASTIC PROGRAMMING [J].

Ghadimi, Saeed ;

Lan, Guanghui .

SIAM JOURNAL ON OPTIMIZATION, 2013, 23 (04) :2341-2368

[10]

Guo JR, 2020, INT CONF ACOUST SPEE, P1603, DOI [10.1109/ICASSP40776.2020.9054164, 10.1109/icassp40776.2020.9054164]

← 1 2 3 4 →