swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer

被引:8
作者
Li, Mingfan [1 ]
Lin, Han [1 ]
Chen, Junshi [1 ]
Diaz, Jose Monsalve [2 ]
Xiao, Qian [3 ]
Lin, Rongfen [3 ]
Wang, Fei [3 ]
Gao, Guang R. [2 ]
An, Hong [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Anhua 230026, Peoples R China
[2] Univ Delaware, Elect & Comp Engn, Newark, DE USA
[3] Wuxi Jiangnan Inst Comp Technol, Wuxi 214083, Jiangsu, Peoples R China
关键词
Deep learning; High performance computing; Convolutional neural networks; Cancerous region detection; MODEL;
D O I
10.1016/j.ins.2020.12.079
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning technology is widely used in many modern fields and a number of models and software frameworks have been proposed. However, it is still very difficult to process deep learning tasks efficiently on traditional high performance computing (HPC) systems. In this paper, we propose swFLOW: a large-scale distributed framework for deep learning on Sunway TaihuLight. Based on the performance analysis results of convolutional neural network (CNN), we optimize the convolutional layer, and get 10.42x speedup compared to the original version. As for distributed training, we use elastic averaging stochastic gradient descent (EASGD) algorithm to reduce communication. On 512 processes, we get a parallel efficiency of 81.01% with communication period tau = 8. Particularly, a decentralized implementation of distributed swFLOW system is presented to alleviate bottleneck of the central server. By using distributed swFLOW system, we can scale the batch size up to 4096 among 1024 concurrent processes for cancerous region detection algorithm. The successful application on swFLOW reveals the great opportunity for joint combination of deep learning and HPC system. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:831 / 847
页数:17
相关论文
共 46 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   Classification of Breast Cancer Based on Histology Images Using Convolutional Neural Networks [J].
Bardou, Dalal ;
Zhang, Kun ;
Ahmad, Sayed Mohammad .
IEEE ACCESS, 2018, 6 :24680-24693
[3]   Optimization Methods for Large-Scale Machine Learning [J].
Bottou, Leon ;
Curtis, Frank E. ;
Nocedal, Jorge .
SIAM REVIEW, 2018, 60 (02) :223-311
[4]   A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks [J].
Chen, Jianguo ;
Li, Kenli ;
Bilal, Kashif ;
Zhou, Xu ;
Li, Keqin ;
Yu, Philip S. .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (05) :965-976
[5]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[6]   Equivalence among Stochastic Logic Circuits and its Application [J].
Chen, Te-Hsuan ;
Hayes, John P. .
2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2015,
[7]   Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer [J].
Chen, Yuedan ;
Li, Kenli ;
Yang, Wangdong ;
Xiao, Guoqing ;
Xie, Xianghui ;
Li, Tao .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (04) :923-938
[8]  
Chetlur S., 2014, ARXIV14100759
[9]  
Dean J., 2012, Advances in Neural Information Processing Systems, P1223
[10]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848