Distributed Computing with Heterogeneous Servers

被引:0
作者
Xu, Jiasheng [1 ]
Fu, Luoyi [2 ]
Wang, Xinbing [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Comp Sci, Shanghai, Peoples R China
来源
2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM) | 2020年
基金
国家重点研发计划;
关键词
distributed computing; coded computation; heterogeneous servers; straggling effect;
D O I
10.1109/GLOBECOM42002.2020.9322379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed computing is known for its high efficiency of processing large amounts of data in parallel, at the expense of communication load between different servers. Coding was introduced to minimize the communication load by exploiting the repetitive computing, thus drawing great attention within the academia. Most existing works assume that all servers are identical in computational capability, which is inconsistent with practical scenarios. In this paper, we investigate a distributed computing system that consists of two types of servers, i.e., fast servers and slow servers. Due to the heterogeneous computational capabilities within the system, the overall computation time will be delayed by the slow servers, which is called the straggling effect. To this end, we develop a novel framework of coding-based distributed computing to alleviate the straggling effect. Specifically, for a given number of fast servers and slow servers with their corresponding computational capabilities, we aim to minimize the overall computation time by assigning different amounts of workloads to different servers. Further, we derive the information-theoretic lower hound of the communication load of the system, which is shown to be within a constant multiplicative gap to the achievable communication load by our scheme.
引用
收藏
页数:6
相关论文
共 17 条
[1]  
[Anonymous], 2016, ARXIV161203301
[2]  
Arnold B.C., 1992, 1 COURSE ORDER STAT, V54
[3]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[4]  
Kiamari M, 2017, IEEE GLOB COMM CONF
[5]   Speeding Up Distributed Machine Learning Using Codes [J].
Lee, Kangwook ;
Lam, Maximilian ;
Pedarsani, Ramtin ;
Papailiopoulos, Dimitris ;
Ramchandran, Kannan .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (03) :1514-1529
[6]  
Lee K, 2017, IEEE INT SYMP INFO, P2418, DOI 10.1109/ISIT.2017.8006963
[7]  
Li S., 2016, ARXIV160407086
[8]  
Li SC, 2016, DES AUT CON, DOI [10.1109/ICAUMS.2016.8479697, 10.1145/2897937.2898064]
[9]   Coded TeraSort [J].
Li, Songze ;
Supittayapornpong, Sucha ;
Maddah-Ali, Mohammad Ali ;
Avestimehr, Salman .
2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, :389-398
[10]   A Scalable Framework for Wireless Distributed Computing [J].
Li, Songze ;
Yu, Qian ;
Maddah-Ali, Mohammad Ali ;
Avestimehr, A. Salman .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2017, 25 (05) :2643-2654