DenseStream: A Novel Data Representation for Gradient Sparsification in Distributed Synchronous SGD Algorithms

被引:0
|
作者
Li, Guangyao [1 ]
Liao, Mingxue [1 ]
Chao, Yongyue [1 ]
Lv, Pin [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
来源
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年
关键词
deep learning; AllReduce; gradient sparsification; data representation;
D O I
10.1109/IJCNN54540.2023.10191729
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed training is widely used in training large-scale deep learning model, and data parallelism is one of the dominant algorithms. Data-parallel training has additional communication overhead, which greatly affects the training at low bandwidth. Gradient sparsification is a promising technique to reduce the communication volume, which keeps a small number of important gradient values and sets the rest to zero. However, the communication of sparsified gradients suffer from scalability issues for (1) the communication volume of the AllGather algorithm, which is commonly used to accumulate sparse gradients, increases linearly with the number of nodes, and (2) sparse local gradients may return dense due to gradient accumulation. These issues hinder the application of gradient sparsification. We observe that sparse gradient value distribution has great locality, and therefore we propose DenseStream, a novel data representation for sparse gradients in data-parallel training to alleviate the issues. DenseStream integrates an efficient sparse AllReduce algorithm with the synchronous SGD (S-SGD). Evaluations are conducted by real-world applications. Experimental results show that DenseStream achieves better compression ratio at higher densities and can represent sparse vectors with a wider range of densities. Compared with dense AllReduce, our method is more scalable and achieves 3.1-12.1x improvement.
引用
收藏
页数:8
相关论文
共 5 条
  • [1] A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification
    Shi, Shaohuai
    Zhao, Kaiyong
    Wang, Qiang
    Tang, Zhenheng
    Chu, Xiaowen
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3411 - 3417
  • [2] MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms
    Shi, Shaohuai
    Chu, Xiaowen
    Li, Bo
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 172 - 180
  • [3] A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks
    Shi, Shaohuai
    Wang, Qiang
    Zhao, Kaiyong
    Tang, Zhenheng
    Wang, Yuxin
    Huang, Xiang
    Chu, Xiaowen
    2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 2238 - 2247
  • [4] GT-SGD: a Novel Gradient Synchronization Algorithm in Training Distributed Recurrent Neural Network Language Models
    Zhang, Xiaoci
    Gu, Naijie
    Yasrab, Robail
    Ye, Hong
    2017 INTERNATIONAL CONFERENCE ON NETWORKING AND NETWORK APPLICATIONS (NANA), 2017, : 274 - 278
  • [5] Multifactor data analysis to forecast an individual's severity over novel COVID-19 pandemic using extreme gradient boosting and random forest classifier algorithms
    Yenurkar, Ganesh Keshaorao
    Mal, Sandip
    Nyangaresi, Vincent O.
    Hedau, Anshul
    Hatwar, Prajwal
    Rajurkar, Shreyas
    Khobragade, Juli
    ENGINEERING REPORTS, 2023, 5 (12)