Learning-Based Coded Computation

被引:15
作者
Kosaian, Jack [1 ]
Rashmi, K. V. [1 ]
Venkataraman, Shivaram [2 ]
机构
[1] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Wisconsin, Dept Comp Sci, Madison, WI 53715 USA
来源
IEEE JOURNAL ON SELECTED AREAS IN INFORMATION THEORY | 2020年 / 1卷 / 01期
关键词
Reliability; fault tolerance; redundancy; information theory; codes; neural networks;
D O I
10.1109/JSAIT.2020.2983165
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances have shown the potential for coded computation to impart resilience against slowdowns and failures that occur in distributed computing systems. However, existing coded computation approaches are either unable to support non-linear computations, or can only support a limited subset of non-linear computations while requiring high resource overhead. In this work, we propose a learning-based coded computation framework to overcome the challenges of performing coded computation for general non-linear functions. We show that careful use of machine learning within the coded computation framework can extend the reach of coded computation to imparting resilience to more general non-linear computations. We showcase the applicability of learning-based coded computation to neural network inference, a major workload in production services. Our evaluation results show that learning-based coded computation enables accurate reconstruction of unavailable results from widely deployed neural networks for a variety of inference tasks such as image classification, speech recognition, and object localization. We implement our proposed approach atop an open-source prediction serving system and show its promise in alleviating slowdowns that occur in neural network inference. These results indicate the potential for learning-based approaches to open new doors for the use of coded computation for broader, non-linear computations.
引用
收藏
页码:227 / 236
页数:10
相关论文
共 40 条
[1]   LASER: A Scalable Response Prediction Platform For Online Advertising [J].
Agarwal, Deepak ;
Long, Bo ;
Traupman, Jonathan ;
Xin, Doris ;
Zhang, Liang .
WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2014, :173-182
[2]  
Aoudia F. A., 2018, arXiv
[3]  
Crankshaw D, 2017, PROCEEDINGS OF NSDI '17: 14TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, P613
[4]   The Tail at Scale [J].
Dean, Jeffrey ;
Barroso, Luiz Andre .
COMMUNICATIONS OF THE ACM, 2013, 56 (02) :74-80
[5]  
Dutta S, 2018, IEEE INT SYMP INFO, P1585, DOI 10.1109/ISIT.2018.8437852
[6]  
Dutta S, 2017, IEEE INT SYMP INFO, P2403, DOI 10.1109/ISIT.2017.8006960
[7]  
Dutta Sanghamitra, 2016, PROC ADV NEURAL INF, P2100
[8]  
Elson J, 2007, CCS'07: PROCEEDINGS OF THE 14TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, P366
[9]  
Glorot X., 2010, P 13 INT C ART INT S, P249, DOI DOI 10.1109/LGRS.2016.2565705
[10]  
Halbawi W, 2018, IEEE INT SYMP INFO, P2027, DOI 10.1109/ISIT.2018.8437467