Block-Diagonal and LT Codes for Distributed Computing With Straggling Servers

被引:64
作者
Severinson, Albin [1 ,2 ,3 ]
Graell i Amat, Alexandre [1 ]
Rosnes, Eirik [2 ]
机构
[1] Chalmers Univ Technol, Dept Elect Engn, SE-41296 Gothenburg, Sweden
[2] Simula UiB, N-5020 Bergen, Norway
[3] Univ Bergen, Dept Informat, N-5020 Bergen, Norway
基金
瑞典研究理事会;
关键词
Block-diagonal coding; computational delay; decoding delay; distributed computing; Luby transform codes; machine learning algorithms; maximum distance separable codes; straggling servers;
D O I
10.1109/TCOMM.2018.2877391
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose two coded schemes for the distributed computing problem of multiplying a matrix by a set of vectors. The first scheme is based on partitioning the matrix into submatrices and applying maximum distance separable (MDS) codes to each submatrix. For this scheme, we prove that up to a given number of partitions the communication load and the computational delay (not including the encoding and decoding delay) are identical to those of the scheme recently proposed by Li et al., based on a single, long MDS code. However, due to the use of shorter MDS codes, our scheme yields a significantly lower overall computational delay when the delay incurred by encoding and decoding is also considered. We further propose a second coded scheme based on Luby transform (LT) codes under inactivation decoding. Interestingly, LT codes may reduce the delay over the partitioned scheme at the expense of an increased communication load. We also consider distributed computing under a deadline and show numerically that the proposed schemes outperform other schemes in the literature, with the LT code-based scheme yielding the best performance for the scenarios considered.
引用
收藏
页码:1739 / 1753
页数:15
相关论文
共 29 条
  • [11] Dutta S, 2017, IEEE INT SYMP INFO, P2403, DOI 10.1109/ISIT.2017.8006960
  • [12] On Decoding Complexity of Reed-Solomon Codes on the Packet Erasure Channel
    Garrammone, Giuliano
    [J]. IEEE COMMUNICATIONS LETTERS, 2013, 17 (04) : 773 - 776
  • [13] The PageRank Problem, Multiagent Consensus, and Web Aggregation A SYSTEMS AND CONTROL VIEWPOINT
    Ishii, Hideaki
    Tempo, Roberto
    [J]. IEEE CONTROL SYSTEMS MAGAZINE, 2014, 34 (03): : 34 - 53
  • [14] Speeding Up Distributed Machine Learning Using Codes
    Lee, Kangwook
    Lam, Maximilian
    Pedarsani, Ramtin
    Papailiopoulos, Dimitris
    Ramchandran, Kannan
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (03) : 1514 - 1529
  • [15] Lee K, 2017, IEEE INT SYMP INFO, P2418, DOI 10.1109/ISIT.2017.8006963
  • [16] Li S., 2016, PROC IEEE GLOBECOM W, P1
  • [17] Li SZ, 2015, ANN ALLERTON CONF, P964, DOI 10.1109/ALLERTON.2015.7447112
  • [18] Liang G, 2014, IEEE INFOCOM SER, P826, DOI 10.1109/INFOCOM.2014.6848010
  • [19] Novel Polynomial Basis With Fast Fourier Transform and Its Application to Reed-Solomon Erasure Codes
    Lin, Sian-Jheng
    Al-Naffouri, Tareq Y.
    Han, Yunghsiang S.
    Chung, Wei-Ho
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2016, 62 (11) : 6284 - 6299
  • [20] Luby M, 2002, ANN IEEE SYMP FOUND, P271, DOI 10.1109/SFCS.2002.1181950