Self-aware distributed deep learning framework for heterogeneous IoT edge devices

被引:19
作者
Jin, Yi [1 ]
Cai, Jiawei [1 ]
Xu, Jiawei [1 ]
Huan, Yuxiang [1 ,2 ]
Yan, Yulong [1 ]
Huang, Bin [1 ]
Guo, Yongliang [1 ]
Zheng, Lirong [1 ]
Zou, Zhuo [1 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, State Key Lab ASIC & Syst, Shanghai, Peoples R China
[2] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2021年 / 125卷
基金
中国国家自然科学基金;
关键词
Internet of Things (IoT); Edge computing; Distributed deep learning; Deep neural networks; Self-awareness; ARCHITECTURE;
D O I
10.1016/j.future.2021.07.010
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Implementing artificial intelligence (AI) in the Internet of Things (IoT) involves a move from the cloud to the heterogeneous and low-power edge, following an urgent demand for deploying complex training tasks in a distributed and reliable manner. This work proposes a self-aware distributed deep learning (DDL) framework for IoT applications, which is applicable to heterogeneous edge devices aiming to improve adaptivity and amortize the training cost. The self-aware design including the dynamic self-organizing approach and the self-healing method enhances the system reliability and resilience. Three typical edge devices are adopted with cross-platform Docker deployment: Personal Computers (PC) for general computing devices, Raspberry Pi 4Bs (Rpi) for resource-constrained edge devices, and Jetson Nanos (Jts) for AI-enabled edge devices. Benchmarked with ResNet-32 on CIFAR-10, the training efficiency of tested distributed clusters is increased by 8.44x compared to the standalone Rpi. The cluster with 11 heterogeneous edge devices achieves a training efficiency of 200.4 images/s and an accuracy of 92.45%. Results prove that the self-organizing approach functions well with dynamic changes like devices being removed or added. The self-healing method is evaluated with various stabilities, cluster scales, and breakdown cases, testifying that the reliability can be largely enhanced for extensively distributed deployments. The proposed DDL framework shows excellent performance for training implementation with heterogeneous edge devices in IoT applications with high-degree scalability and reliability. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:908 / 920
页数:13
相关论文
共 51 条
[1]  
Abadi Martin, 2016, arXiv
[2]   A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling [J].
Agrawal, Ankur ;
Lee, Sae Kyu ;
Silberman, Joel ;
Ziegler, Matthew ;
Kang, Mingu ;
Venkataramani, Swagath ;
Cao, Nianzheng ;
Fleischer, Bruce ;
Guillorn, Michael ;
Cohen, Matthew ;
Mueller, Silvia ;
Oh, Jinwook ;
Lutz, Martin ;
Jung, Jinwook ;
Koswatta, Siyu ;
Zhou, Ching ;
Zalani, Vidhi ;
Bonanno, James ;
Casatuta, Robert ;
Chen, Chia-Yu ;
Choi, Jungwook ;
Haynie, Howard ;
Herbert, Alyssa ;
Jain, Radhika ;
Kar, Monodeep ;
Kim, Kyu-Hyoun ;
Li, Yulong ;
Ren, Zhibin ;
Rider, Scot ;
Schaal, Marcel ;
Schelm, Kerstin ;
Scheuermann, Michael ;
Sun, Xiao ;
Tran, Hung ;
Wang, Naigang ;
Wang, Wei ;
Zhang, Xin ;
Shah, Vinay ;
Curran, Brian ;
Srinivasan, Vijayalakshmi ;
Lu, Pong-Fei ;
Shukla, Sunil ;
Chang, Leland ;
Gopalakrishnan, Kailash .
2021 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC), 2021, 64 :144-+
[3]   Emerging Edge Computing Technologies for Distributed IoT Systems [J].
Alnoman, Ali ;
Sharma, Shree Krishna ;
Ejaz, Waleed ;
Anpalagan, Alagan .
IEEE NETWORK, 2019, 33 (06) :140-147
[4]   Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air [J].
Amiri, Mohammad Mohammadi ;
Gunduz, Deniz .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 (68) :2155-2169
[5]  
[Anonymous], 2019, IEEE HIGH PERF EXTR, DOI [DOI 10.1109/hpec.2019.8916327, 10.1109/HPEC.2019.8916327]
[6]  
Baidu, 2020, ALL RED ALG
[7]   Feasibility of Fog Computing Deployment based on Docker Containerization over RaspberryPi [J].
Bellavista, Paolo ;
Zanni, Alessandro .
18TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING (ICDCN 2017), 2017,
[8]   Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis [J].
Ben-Nun, Tal ;
Hoefler, Torsten .
ACM COMPUTING SURVEYS, 2019, 52 (04)
[9]   Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT [J].
Bhardwaj, Kartikeya ;
Lin, Ching-Yi ;
Sartor, Anderson ;
Marculescu, Radu .
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2019, 18 (05)
[10]  
C.V.N. Index, FOR TRENDS 2017 2022