On the Optimal Recovery Threshold of Coded Matrix Multiplication

被引:169
作者
Dutta, Sanghamitra [1 ]
Fahim, Mohammad [2 ]
Haddadpour, Farzin [2 ]
Jeong, Haewon [1 ]
Cadambe, Viveck [2 ]
Grover, Pulkit [1 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Penn State Univ, Dept Elect Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Coded computing; distributed computing; matrix multiplication; recovery threshold; stragglers; VANDERMONDE; INVERSION; MAPREDUCE; ALGORITHM; SYSTEMS; STORAGE;
D O I
10.1109/TIT.2019.2929328
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We provide novel coded computation strategies for distributed matrix-matrix products that outperform the recent "Polynomial code" constructions in recovery threshold, i.e., the required number of successful workers. When a fixed 1/m fraction of each matrix can be stored at each worker node, Polynomial codes require m(2) successful workers, while our MatDot codes only require 2m - 1 successful workers. However, MatDot codes have higher computation cost per worker and higher communication cost from each worker to the fusion node. We also provide a systematic construction of MatDot codes. Furthermore, we propose "PolyDot" coding that interpolates between Polynomial codes and MatDot codes to trade off computation/communication costs and recovery thresholds. Finally, we demonstrate a novel coding technique for multiplying n matrices (n >= 3) using ideas from MatDot and PolyDot codes.
引用
收藏
页码:278 / 301
页数:24
相关论文
共 64 条
[1]  
Aktas M. F., 2018, ACM SIGMETRICS PERFO, V45, P224
[2]   Coded Computation Against Processing Delays for Virtualized Cloud-Based Channel Decoding [J].
Aliasgari, Malihe ;
Kliewer, Jorg ;
Simeone, Osvaldo .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2019, 67 (01) :28-38
[3]  
Ananthanarayanan Ganesh, 2014, Proceedings of NSDI '14: 11th USENIX Symposium on Networked Systems Design and Implementation. NSDI '14, P289
[4]  
[Anonymous], 2016, 2016 IEEE POWER ENER
[5]  
[Anonymous], 2018, ARXIV180409791
[6]  
[Anonymous], 2017, ACM SIGMETRICS PERFO
[7]  
[Anonymous], 2018, P INT C MACH LEARN I
[8]  
[Anonymous], 2016, P NIPS MACH LEARN SY
[9]  
[Anonymous], 2018, P INT C MACH LEARN I
[10]   Distributed Solution of Large-Scale Linear Systems via Accelerated Projection-Based Consensus [J].
Azizan-Ruhi, Navid ;
Lahouti, Farshad ;
Avestimehr, Amir Salman ;
Hassibi, Babak .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (14) :3806-3817