A Fundamental Tradeoff Between Computation and Communication in Distributed Computing

被引:328
作者
Li, Songze [1 ]
Maddah-Ali, Mohammad Ali [2 ]
Yu, Qian [1 ]
Avestimehr, A. Salman [1 ]
机构
[1] Univ Southern Calif, Dept Elect Engn, Los Angeles, CA 90089 USA
[2] Sharif Univ Technol, Dept Elect Engn, Tehran 11365, Iran
关键词
Distributed computing; MapReduce; computation-communication tradeoff; coded multicasting; coded TeraSort; SUM;
D O I
10.1109/TIT.2017.2756959
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in distributed computing, i.e., the two are inversely proportional to each other. More specifically, a general distributed computing framework, motivated by commonly used structures like MapReduce, is considered, where the overall computation is decomposed into computing a set of "Map" and "Reduce" functions distributedly across multiple computing nodes. A coded scheme, named "coded distributed computing" (CDC), is proposed to demonstrate that increasing the computation load of the Map functions by a factor of r (i.e., evaluating each function at r carefully chosen nodes) can create novel coding opportunities that reduce the communication load by the same factor. An information-theoretic lower bound on the communication load is also provided, which matches the communication load achieved by the CDC scheme. As a result, the optimal computation-communication tradeoff in distributed computing is exactly characterized. Finally, the coding techniques of CDC is applied to the Hadoop TeraSort benchmark to develop a novel CodedTeraSort algorithm, which is empirically demonstrated to speed up the overall job execution by 1.97x - 3.39x, for typical settings of interest.
引用
收藏
页码:109 / 128
页数:20
相关论文
共 52 条
[1]   Network information flow [J].
Ahlswede, R ;
Cai, N ;
Li, SYR ;
Yeung, RW .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2000, 46 (04) :1204-1216
[2]  
Ahmad Faraz., 2012, ACM SIGARCH Computer Architecture News, V40, P61
[3]   A scalable, commodity data center network architecture [J].
Al-Fares, Mohammad ;
Loukissas, Alexander ;
Vahdat, Amin .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2008, 38 (04) :63-74
[4]  
[Anonymous], **DROPPED REF**
[5]  
[Anonymous], 2011, Mining of Massive Datasets
[6]  
[Anonymous], 2009, Proceedings of the VLDB Endowment
[7]  
[Anonymous], 2003, P 19 ACM S OP SYST P, DOI [10.1145/1165389.945450, DOI 10.1145/1165389.945450]
[8]   Index Coding With Side Information [J].
Bar-Yossef, Ziv ;
Birk, Yitzhak ;
Jayram, T. S. ;
Kol, Tomer .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2011, 57 (03) :1479-1494
[9]  
Becker K., 1998, 5th ACM Conference on Computer and Communications Security, P1, DOI 10.1145/288090.288094
[10]   Coding on demand by an informed source (ISCOD) for efficient broadcast of different supplemental data to caching clients [J].
Birk, Yitzhak ;
Kol, Tomer .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (06) :2825-2830