DenseStream: A Novel Data Representation for Gradient Sparsification in Distributed Synchronous SGD Algorithms

被引：0

作者：

Li, Guangyao ^{[1
]}

Liao, Mingxue ^{[1
]}

Chao, Yongyue ^{[1
]}

Lv, Pin ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

来源：

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年

关键词：

deep learning; AllReduce; gradient sparsification; data representation;

D O I：

10.1109/IJCNN54540.2023.10191729

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distributed training is widely used in training large-scale deep learning model, and data parallelism is one of the dominant algorithms. Data-parallel training has additional communication overhead, which greatly affects the training at low bandwidth. Gradient sparsification is a promising technique to reduce the communication volume, which keeps a small number of important gradient values and sets the rest to zero. However, the communication of sparsified gradients suffer from scalability issues for (1) the communication volume of the AllGather algorithm, which is commonly used to accumulate sparse gradients, increases linearly with the number of nodes, and (2) sparse local gradients may return dense due to gradient accumulation. These issues hinder the application of gradient sparsification. We observe that sparse gradient value distribution has great locality, and therefore we propose DenseStream, a novel data representation for sparse gradients in data-parallel training to alleviate the issues. DenseStream integrates an efficient sparse AllReduce algorithm with the synchronous SGD (S-SGD). Evaluations are conducted by real-world applications. Experimental results show that DenseStream achieves better compression ratio at higher densities and can represent sparse vectors with a wider range of densities. Compared with dense AllReduce, our method is more scalable and achieves 3.1-12.1x improvement.

引用

页数：8

共 5 条

[1] A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification
Shi, Shaohuai
Zhao, Kaiyong
Wang, Qiang
Tang, Zhenheng
Chu, Xiaowen
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3411 - 3417
[2] MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms
Shi, Shaohuai
Chu, Xiaowen
Li, Bo
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 172 - 180
[3] A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks
Shi, Shaohuai
Wang, Qiang
Zhao, Kaiyong
Tang, Zhenheng
Wang, Yuxin
Huang, Xiang
Chu, Xiaowen
2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 2238 - 2247
[4] GT-SGD: a Novel Gradient Synchronization Algorithm in Training Distributed Recurrent Neural Network Language Models
Zhang, Xiaoci
Gu, Naijie
Yasrab, Robail
Ye, Hong
2017 INTERNATIONAL CONFERENCE ON NETWORKING AND NETWORK APPLICATIONS (NANA), 2017, : 274 - 278
[5] Multifactor data analysis to forecast an individual's severity over novel COVID-19 pandemic using extreme gradient boosting and random forest classifier algorithms
Yenurkar, Ganesh Keshaorao
Mal, Sandip
Nyangaresi, Vincent O.
Hedau, Anshul
Hatwar, Prajwal
Rajurkar, Shreyas
Khobragade, Juli
ENGINEERING REPORTS, 2023, 5 (12)

← 1 →