Powering Multi-Task Federated Learning with Competitive GPU Resource Sharing

被引：0

作者：

Yu, Yongbo ^{[1
]}

Yu, Fuxun ^{[1
]}

Xu, Zirui ^{[1
]}

Wang, Di ^{[2
]}

Zhang, Mingjia ^{[2
]}

Li, Ang ^{[3
]}

Bray, Shawn ^{[4
]}

Liu, Chenchen ^{[4
]}

Chen, Xiang ^{[1
]}

机构：

[1] George Mason Univ, Fairfax, VA 22030 USA

[2] Microsoft, Redmond, WA USA

[3] Duke Univ, Durham, NC USA

[4] Univ Maryland, Baltimore, MD USA

来源：

COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION | 2022年

关键词：

Federated Learning; Multi-Task Learning; GPU Resource Allocation;

D O I：

10.1145/3487553.3524859

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Federated learning (FL) nowadays involves compound learning tasks as cognitive applications' complexity increases. For example, a self-driving system hosts multiple tasks simultaneously (e.g., detection, classification, etc.) and expects FL to retain life-long intelligence involvement. However, our analysis demonstrates that, when deploying compound FL models for multiple training tasks on a GPU, certain issues arise: (1) As different tasks' skewed data distributions and corresponding models cause highly imbalanced learning workloads, current GPU scheduling methods lack effective resource allocations; (2) Therefore, existing FL schemes, only focusing on heterogeneous data distribution but runtime computing, cannot practically achieve optimally synchronized federation. To address these issues, we propose a full-stack FL optimization scheme to address both intra-device GPU scheduling and inter-device FL co-ordination for multi-task training. Specifically, our works illustrate two key insights in this research domain: (1) Competitive resource sharing is beneficial for parallel model executions, and the proposed concept of "virtual resource" could effectively characterize and guide the practical per-task resource utilization and allocation. (2) FL could be further improved by taking architectural level coordination into consideration. Our experiments demonstrate that the FL throughput could be significantly escalated.

引用

页码：567 / 571

页数：5

共 10 条

[1]

Abad MSH, 2020, INT CONF ACOUST SPEE, P8866, DOI [10.1109/ICASSP40776.2020.9054634, 10.1109/icassp40776.2020.9054634]

[2]

Kato S., 2011, P 2011 USENIX C USEN, P17

[3]

Konečny J, 2016, Arxiv, DOI [arXiv:1610.02527, DOI 10.48550/ARXIV.1610.02527]

[4] Federated Learning: Challenges, Methods, and Future Directions [J].

Li, Tian ;

Sahu, Anit Kumar ;

Talwalkar, Ameet ;

Smith, Virginia .

IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) :50-60

[5] Efficient GPU Spatial-Temporal Multitasking [J].

Liang, Yun ;

Huynh Phung Huynh ;

Rupnow, Kyle ;

Goh, Rick Siow Mong ;

Chen, Deming .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (03) :748-760

[6]

Nvidia, 2021, Multi-Process Service

[7]

Nvidia Inc, 2021, NVIDIA Multi-Instance GPU User Guide

[8] Round robin scheduling - a survey [J].

Rasmussen, Rasmus V. ;

Trick, Michael A. .

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2008, 188 (03) :617-636

[9] Mystic: Predictive Scheduling for GPU based Cloud Servers using Machine Learning [J].

Ukidave, Yash ;

Li, Xiangyu ;

Kaeli, David .

2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, :353-362

[10] Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems [J].

Yeung, Gingfung ;

Borowiec, Damian ;

Yang, Renyu ;

Friday, Adrian ;

Harper, Richard ;

Garraghan, Peter .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (01) :88-100

← 1 →