Efficient Training of Large-Scale Neural Networks Using Linear Pipeline Broadcast

被引:0
|
作者
Yu, Chanhee [1 ,3 ]
Park, Kyongseok [2 ,3 ]
机构
[1] Univ Sci & Technol, Dept Big Data Sci, Daejeon 34112, South Korea
[2] Univ Sci & Technol, Dept Appl AI, Daejeon 34112, South Korea
[3] Korea Inst Sci & Technol Informat, Ctr Supercomp Technol Dev, Daejeon 34141, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Training; Memory management; Pipeline processing; Throughput; Backpropagation; Computational modeling; Neural networks; Synchronization; Performance evaluation; Deep learning; Broadcast; parallel pattern; deep learning; distributed training; pipeline parallelism;
D O I
10.1109/ACCESS.2024.3492314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, the adoption of deep learning models in several domains and for various tasks has increased, correspondingly amplifying the number of model layers and parameters needed to achieve the required performance. Accordingly, the amount of memory required for model training has increased, advancing the adoption and exploration of distributed training. Generally, model parallelism techniques require a large amount of memory for training during distributed training. Among them, layer pipelining, which involves dividing the model into layers and configuring the stages on the devices, has attracted interest. Activation recomputation is a popular method for efficiently utilizing pipeline parallelism while minimizing memory consumption. However, it can lead to a decrease in training throughput due to redundant operations. Therefore, this study introduces a forward propagation technique that employs a linear pipeline broadcast method to decrease memory consumption while mitigating training throughput reduction by partially integrating recomputation in PipeDream-Flush. The proposed broadcast-based forward propagation offsets the overhead caused by activation recomputation by optimizing network communication between pipeline stages and reducing bubbles in the warm-up phase of the pipeline. Experimental results demonstrate that the proposed technique reduces memory consumption by approximately 36.0% at peak training throughput for GPT2 than PipeDream-Flush, without a significant decrease in training throughput. Compared with that for PipeDream-Flush, the proposed method achieved peak training throughputs of 14.6% and 12.6% higher for the ResNet152 and VGG19 models, respectively, while consuming 30.1% and 12.0% lesser memory.
引用
收藏
页码:165653 / 165662
页数:10
相关论文
共 50 条
  • [31] Large-scale parcellation of the ventricular system using convolutional neural networks
    Atlason, Hans E.
    Shao, Muhan
    Robertsson, Vidar
    Sigurdsson, Sigurdur
    Gudnason, Vilmundur
    Prince, Jerry L.
    Ellingsen, Lotta M.
    MEDICAL IMAGING 2019: BIOMEDICAL APPLICATIONS IN MOLECULAR, STRUCTURAL, AND FUNCTIONAL IMAGING, 2019, 10953
  • [32] Modeling of complex large-scale system using fuzzy neural networks
    Liu, J.
    San, Y.
    Wang, Z.
    Xitong Fangzhen Xuebao / Journal of System Simulation, 2001, 13 (03): : 304 - 307
  • [33] LARGE-SCALE MALWARE CLASSIFICATION USING RANDOM PROJECTIONS AND NEURAL NETWORKS
    Dahl, George E.
    Stokes, Jack W.
    Deng, Li
    Yu, Dong
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3422 - 3426
  • [34] Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks
    Wang, Pichao
    Li, Wanqing
    Liu, Song
    Gao, Zhimin
    Tang, Chang
    Ogunbona, Philip
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 7 - 12
  • [35] Optical flame detection using large-scale artificial neural networks
    Huseynov, J
    Boger, Z
    Shubinsky, G
    Baliga, S
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 1959 - 1964
  • [36] Efficient Topologies for Large-scale Cluster Networks
    Kim, John
    Dally, William J.
    Abts, Dennis
    2010 CONFERENCE ON OPTICAL FIBER COMMUNICATION OFC COLLOCATED NATIONAL FIBER OPTIC ENGINEERS CONFERENCE OFC-NFOEC, 2010,
  • [37] Efficient approach to modeling of large-scale fluid pipeline network
    He, S.H.
    Zhong, J.
    Zhongguo Jixie Gongcheng/China Mechanical Engineering, 2001, 12 (02):
  • [38] POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters
    Li, Shunde
    Gu, Junyu
    Wang, Jue
    Yao, Tiechui
    Liang, Zhiqiang
    Shi, Yumeng
    Li, Shigang
    Xi, Weiting
    Li, Shushen
    Zhou, Chunbao
    Wang, Yangang
    Chi, Xuebin
    PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 469 - 471
  • [39] A POWERFUL IMPROVEMENT ON THE METHODOLOGY FOR SOLVING LARGE-SCALE PIPELINE NETWORKS
    MARTINEZBENET, JM
    PUIGJANER, L
    COMPUTERS & CHEMICAL ENGINEERING, 1988, 12 (2-3) : 261 - 265
  • [40] A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks
    Chen, Jianguo
    Li, Kenli
    Bilal, Kashif
    Zhou, Xu
    Li, Keqin
    Yu, Philip S.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (05) : 965 - 976