A Quick Survey on Large Scale Distributed Deep Learning Systems

被引:0
|
作者
Zhang, Zhaoning [1 ]
Yin, Lujia [1 ]
Peng, Yuxing [1 ]
Li, Dongsheng [1 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Lab, Changsha, Hunan, Peoples R China
来源
2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018) | 2018年
关键词
Deep Learning; Distributed Systems; Large Scale;
D O I
10.1109/ICPADS.2018.00142
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning have been widely used in various fields and has worked very well as a major role. While the gradual penetration into various fields, data quantity of each applications is increasing tremendously, and so as the computation complexity and model parameters. As an obvious result, the training and inference is time consuming. For example, a classic Resnet50 classification model will be trained in 14 days on a NVIDIA M40 GPU with ImageNet data set. Thus, distributed acceleration is a very useful way to dispatch the computation of training and even inference to scale of nodes in parallel and accelerate the whole process. Facebook's work and UC Berkeley's acceleration can training the Resnet-50 model within hour and minutes by distributed deep learning algorithm and system, representatively. As other distributed accelerations, it gives a possibility to accelerate large models on large data sets from weeks to minutes, which gives researchers and developers more space to explore and search. However, besides acceleration, what other issues will be confronted of the distributed deep learning system? Where is the upper limit of acceleration? What application will acceleration be used for? What is the price and cost of acceleration? In this paper, we will take a simple and quick survey on the distributed deep learning system from algorithm perspective, distributed system perspective and applications perspective. We will present several recent excellent works, and bring analysis on the restricts and prospects of the distributed methods.
引用
收藏
页码:1052 / 1056
页数:5
相关论文
共 50 条
  • [31] Deep and reinforcement learning for automated task scheduling in large-scale cloud computing systems
    Rjoub, Gaith
    Bentahar, Jamal
    Wahab, Omar Abdel
    Bataineh, Ahmed Saleh
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (23)
  • [32] DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training
    Wang, Lipeng
    Ye, Songgao
    Yang, Baichen
    Lu, Youyou
    Zhang, Hequan
    Yan, Shengen
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [33] Energy Efficiency in Large Scale Distributed Systems - The Role of Simulation
    Karatza, Helen
    ICPE'17: COMPANION OF THE 2017 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, 2017, : 37 - 37
  • [34] Reconfigurable Model Predictive Control for Large Scale Distributed Systems
    Chen, Jun
    Zhang, Lei
    Gao, Weinan
    IEEE SYSTEMS JOURNAL, 2024, 18 (02): : 965 - 976
  • [35] Deep learning with the generative models for recommender systems: A survey
    Nahta, Ravi
    Chauhan, Ganpat Singh
    Meena, Yogesh Kumar
    Gopalani, Dinesh
    COMPUTER SCIENCE REVIEW, 2024, 53
  • [36] Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
    Liu, Zhihong
    Xu, Xin
    Qiao, Peng
    Li, Dongsheng
    ACM COMPUTING SURVEYS, 2025, 57 (04)
  • [37] A Comprehensive Survey of Recommender Systems Based on Deep Learning
    Zhou, Hongde
    Xiong, Fei
    Chen, Hongshu
    APPLIED SCIENCES-BASEL, 2023, 13 (20):
  • [38] Towards Security Threats of Deep Learning Systems: A Survey
    He, Yingzhe
    Meng, Guozhu
    Chen, Kai
    Hu, Xingbo
    He, Jinwen
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (05) : 1743 - 1770
  • [39] Enhancing transportation systems via deep learning: A survey
    Wang, Yuan
    Zhang, Dongxiang
    Liu, Ying
    Dai, Bo
    Lee, Loo Hay
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2019, 99 : 144 - 163
  • [40] Deep Learning for Multi-scale Object Detection: A Survey
    Chen K.-Q.
    Zhu Z.-L.
    Deng X.-M.
    Ma C.-X.
    Wang H.-A.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (04): : 1201 - 1227