Efficient Training of Large-Scale Neural Networks Using Linear Pipeline Broadcast

被引：0

作者：

Yu, Chanhee ^{[1
,3
]}

Park, Kyongseok ^{[2
,3
]}

机构：

[1] Univ Sci & Technol, Dept Big Data Sci, Daejeon 34112, South Korea

[2] Univ Sci & Technol, Dept Appl AI, Daejeon 34112, South Korea

[3] Korea Inst Sci & Technol Informat, Ctr Supercomp Technol Dev, Daejeon 34141, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Training; Memory management; Pipeline processing; Throughput; Backpropagation; Computational modeling; Neural networks; Synchronization; Performance evaluation; Deep learning; Broadcast; parallel pattern; deep learning; distributed training; pipeline parallelism;

D O I：

10.1109/ACCESS.2024.3492314

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, the adoption of deep learning models in several domains and for various tasks has increased, correspondingly amplifying the number of model layers and parameters needed to achieve the required performance. Accordingly, the amount of memory required for model training has increased, advancing the adoption and exploration of distributed training. Generally, model parallelism techniques require a large amount of memory for training during distributed training. Among them, layer pipelining, which involves dividing the model into layers and configuring the stages on the devices, has attracted interest. Activation recomputation is a popular method for efficiently utilizing pipeline parallelism while minimizing memory consumption. However, it can lead to a decrease in training throughput due to redundant operations. Therefore, this study introduces a forward propagation technique that employs a linear pipeline broadcast method to decrease memory consumption while mitigating training throughput reduction by partially integrating recomputation in PipeDream-Flush. The proposed broadcast-based forward propagation offsets the overhead caused by activation recomputation by optimizing network communication between pipeline stages and reducing bubbles in the warm-up phase of the pipeline. Experimental results demonstrate that the proposed technique reduces memory consumption by approximately 36.0% at peak training throughput for GPT2 than PipeDream-Flush, without a significant decrease in training throughput. Compared with that for PipeDream-Flush, the proposed method achieved peak training throughputs of 14.6% and 12.6% higher for the ResNet152 and VGG19 models, respectively, while consuming 30.1% and 12.0% lesser memory.

引用

页码：165653 / 165662

页数：10

共 50 条

[31] Large-scale parcellation of the ventricular system using convolutional neural networks
Atlason, Hans E.
Shao, Muhan
Robertsson, Vidar
Sigurdsson, Sigurdur
Gudnason, Vilmundur
Prince, Jerry L.
Ellingsen, Lotta M.
MEDICAL IMAGING 2019: BIOMEDICAL APPLICATIONS IN MOLECULAR, STRUCTURAL, AND FUNCTIONAL IMAGING, 2019, 10953
[32] Modeling of complex large-scale system using fuzzy neural networks
Liu, J.
San, Y.
Wang, Z.
Xitong Fangzhen Xuebao / Journal of System Simulation, 2001, 13 (03): : 304 - 307
[33] LARGE-SCALE MALWARE CLASSIFICATION USING RANDOM PROJECTIONS AND NEURAL NETWORKS
Dahl, George E.
Stokes, Jack W.
Deng, Li
Yu, Dong
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3422 - 3426
[34] Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks
Wang, Pichao
Li, Wanqing
Liu, Song
Gao, Zhimin
Tang, Chang
Ogunbona, Philip
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 7 - 12
[35] Optical flame detection using large-scale artificial neural networks
Huseynov, J
Boger, Z
Shubinsky, G
Baliga, S
PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 1959 - 1964
[36] Efficient Topologies for Large-scale Cluster Networks
Kim, John
Dally, William J.
Abts, Dennis
2010 CONFERENCE ON OPTICAL FIBER COMMUNICATION OFC COLLOCATED NATIONAL FIBER OPTIC ENGINEERS CONFERENCE OFC-NFOEC, 2010,
[37] Efficient approach to modeling of large-scale fluid pipeline network
He, S.H.
Zhong, J.
Zhongguo Jixie Gongcheng/China Mechanical Engineering, 2001, 12 (02):
[38] POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters
Li, Shunde
Gu, Junyu
Wang, Jue
Yao, Tiechui
Liang, Zhiqiang
Shi, Yumeng
Li, Shigang
Xi, Weiting
Li, Shushen
Zhou, Chunbao
Wang, Yangang
Chi, Xuebin
PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 469 - 471
[39] A POWERFUL IMPROVEMENT ON THE METHODOLOGY FOR SOLVING LARGE-SCALE PIPELINE NETWORKS
MARTINEZBENET, JM
PUIGJANER, L
COMPUTERS & CHEMICAL ENGINEERING, 1988, 12 (2-3) : 261 - 265
[40] A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks
Chen, Jianguo
Li, Kenli
Bilal, Kashif
Zhou, Xu
Li, Keqin
Yu, Philip S.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (05) : 965 - 976

← 1 2 3 4 5 →