Hierarchical Coded Matrix Multiplication

被引:28
作者
Kianidehkordi, Shahrzad [1 ]
Ferdinand, Nuwan [1 ,2 ]
Draper, Stark C. [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 3G4, Canada
[2] Huawei Technol Canada Co Ltd, Ottawa, ON K2K 3J1, Canada
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Encoding; Task analysis; Acceleration; Computational modeling; Interleaved codes; Error correction codes; Visualization; Distributed system; coded computing; stragglers; hierarchical coding; partial computation;
D O I
10.1109/TIT.2020.3036763
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In distributed computing systems slow working nodes, known as stragglers, can greatly extend finishing times. Coded computing is a technique that enables straggler-resistant computation. Most coded computing techniques presented to date provide robustness by ensuring that the time to finish depends only on a set of the fastest nodes. However, while stragglers do compute less work than non-stragglers, in real-world commercial cloud computing systems (e.g., Amazon's Elastic Compute Cloud (EC2)) the distinction is often a soft one. In this paper, we develop hierarchical coded computing that exploits the work completed by all nodes, both fast and slow, automatically integrating the potential contribution of each. We first present a conceptual framework to represent the division of work amongst nodes in coded matrix multiplication as a cuboid partitioning problem. This framework allows us to unify existing methods and motivates new techniques. We then develop three methods of hierarchical coded computing that we term bit-interleaved coded computation (BICC), multilevel coded computation (MLCC), and hybrid hierarchical coded computation (HHCC). In this paradigm, each worker is tasked with completing a sequence (a hierarchy) of ordered subtasks. The sequence of subtasks, and the complexity of each, is designed so that partial work completed by stragglers can be used, rather than ignored. We note that our methods can be used in conjunction with any coded computing method. We illustrate this by showing how we can use our methods to accelerate all previously developed coded computing techniques by enabling them to exploit stragglers. Under a widely studied statistical model of completion time, our approach realizes a 66% improvement in the expected finishing time. On Amazon EC2, the gain was 27% when stragglers are simulated.
引用
收藏
页码:726 / 754
页数:29
相关论文
共 34 条
[1]  
Abbe E. A, 2018, P INT C MACH LEARN I, P9716
[2]  
Ahsanullah M., 2013, An introduction to order statistics, V8
[3]  
Amiri MM, 2019, INT CONF ACOUST SPEE, P8177, DOI [10.1109/ICASSP.2019.8682911, 10.1109/tsp.2019.2952051]
[4]  
[Anonymous], 2012, PROC 26 INT C NEURAL
[5]   UPPER (LOWER) BOUNDS ON THE MEAN OF THE MAXIMUM (MINIMUM) OF A NUMBER OF RANDOM-VARIABLES [J].
AVEN, T .
JOURNAL OF APPLIED PROBABILITY, 1985, 22 (03) :723-728
[6]   Bit-interleaved coded modulation [J].
Caire, G ;
Taricco, G ;
Biglieri, E .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (03) :927-946
[7]   Embodied Question Answering [J].
Das, Abhishek ;
Datta, Samyak ;
Gkioxari, Georgia ;
Lee, Stefan ;
Parikh, Devi ;
Batra, Dhruv .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1-10
[8]   Coded Federated Learning [J].
Dhakal, Sagar ;
Prakash, Saurav ;
Yona, Yair ;
Talwar, Shilpa ;
Himayat, Nageen .
2019 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2019,
[9]  
Didier F., 2009, ARXIV PREPRINT ARXIV
[10]  
Draper S. C., 2019, PROC IEEE INT C MACH, P1