Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices

被引：1

作者：

Ye, Shengyuan ^{[1
]}

Zeng, Liekang ^{[1
]}

Chu, Xiaowen ^{[2
]}

Xing, Guoliang ^{[3
]}

Chen, Xu ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China

[2] HKUST GZ, Data Sci & Analyt Thrust, Guangzhou, Peoples R China

[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE THIRTIETH INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, ACM MOBICOM 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

Edge intelligence; distributed machine learning; data parallelism; pipeline parallelism; hybrid parallelism; INFERENCE;

D O I：

10.1145/3636534.3649363

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

On-device Deep Neural Network (DNN) training has been recognized as crucial for privacy-preserving machine learning at the edge. However, the intensive training workload and limited onboard computing resources pose significant challenges to the availability and efficiency of model training. While existing works address these challenges through native resource management optimization, we instead leverage our observation that edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources beyond a single terminal. We propose Asteroid, a distributed edge training system that breaks the resource walls across heterogeneous edge devices for efficient model training acceleration. Asteroid adopts a hybrid pipeline parallelism to orchestrate distributed training, along with a judicious parallelism planning for maximizing throughput under certain resource constraints. Furthermore, a fault-tolerant yet lightweight pipeline replay mechanism is developed to tame the device-level dynamics for training robustness and performance stability. We implement Asteroid on heterogeneous edge devices with both vision and language models, demonstrating up to 12.2x faster training than conventional parallelism methods and 2.1x faster than state-of-the-art hybrid parallelism methods through evaluations. Furthermore, Asteroid can recover training pipeline 14x faster than baseline methods while preserving comparable throughput despite unexpected device exiting and failure.

引用

页码：312 / 326

页数：15

共 65 条

[11]

Goyal P, 2018, Arxiv, DOI [arXiv:1706.02677, DOI 10.48550/ARXIV.1706.02677V2]

[12]

Hao Pengzhan, 2021, 2021 IEEE ACM S EDG, P1

[13] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[14] Real-time Neural Network Inference on Extremely Weak Devices: Agile Offloading with Explainable AI [J].

Huang, Kai ;

Gao, Wei .

PROCEEDINGS OF THE 2022 THE 28TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, ACM MOBICOM 2022, 2022, :200-213

[15]

Huang YP, 2019, ADV NEUR IN, V32

[16]

Jain Paras, 2020, Proceedings of Machine Learning and Systems, V2, P497

[17]

Jia Fucheng, 2022, MobiSys '22: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services, P209, DOI 10.1145/3498361.3538932

[18]

Jia XY, 2022, PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, P673

[19] Model Pruning Enables Efficient Federated Learning on Edge Devices [J].

Jiang, Yuang ;

Wang, Shiqiang ;

Valls, Victor ;

Ko, Bong Jun ;

Lee, Wei-Han ;

Leung, Kin K. ;

Tassiulas, Leandros .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) :10374-10386

[20]

Jiong Xiaotang, 2020, P MACHINE LEARNING S, V2, P1

← 1 2 3 4 5 6 7 →