Abstract cost models for distributed data-intensive computations

被引:3
作者
Li, Rundong [1 ]
Mi, Ningfang [2 ]
Riedewald, Mirek [1 ]
Sun, Yizhou [3 ]
Yao, Yi [2 ]
机构
[1] Northeastern Univ, CCIS, Boston, MA 02115 USA
[2] Northeastern Univ, ECE, Boston, MA 02115 USA
[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Distributed analytics; Makespan minimization; Cost model; Data partitioning; MATRIX; OPTIMIZATION; ALGORITHMS;
D O I
10.1007/s10619-018-7244-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider data analytics workloads on distributed architectures, in particular clusters of commodity machines. To find a job partitioning that minimizes running time, a cost model, which we more accurately refer to as makespan model, is needed. In attempting to find the simplest possible, but sufficiently accurate, such model, we explore piecewise linear functions of input, output, and computational complexity. They are abstract in the sense that they capture fundamental algorithm properties, but do not require explicit modeling of system and implementation details such as the number of disk accesses. We show how the simplified functional structure can be exploited to reduce optimization cost. In the general case, we identify a lower bound that can be used for search-space pruning. For applications with homogeneous tasks, we further demonstrate how to directly integrate the model into the makespan optimization process, reducing search-space dimensionality and thus complexity by orders of magnitude. Experimental results provide evidence of good prediction quality and successful makespan optimization across a variety of operators and cluster architectures.
引用
收藏
页码:411 / 439
页数:29
相关论文
共 34 条
[1]   A three-dimensional approach to parallel matrix multiplication [J].
Agarwal, RC ;
Balle, SM ;
Gustavson, FG ;
Joshi, M ;
Palkar, P .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1995, 39 (05) :575-582
[2]   Learning-based Query Performance Modeling and Prediction [J].
Akdere, Mert ;
Cetintemel, Ugur ;
Riondato, Matteo ;
Upfal, Eli ;
Zdonik, Stanley B. .
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, :390-401
[3]  
[Anonymous], 1995, TECH REP
[4]  
[Anonymous], 2011, Proceedings of the 2011 international conference on Management of data
[5]   A View of the Parallel Computing Landscape [J].
Asanovic, Krste ;
Bodik, Rastislav ;
Demmel, James ;
Keaveny, Tony ;
Keutzer, Kurt ;
Kubiatowicz, John ;
Morgan, Nelson ;
Patterson, David ;
Sen, Koushik ;
Wawrzynek, John ;
Wessel, David ;
Yelick, Katherine .
COMMUNICATIONS OF THE ACM, 2009, 52 (10) :56-67
[6]  
Ballard G., 2013, P 25 ANN ACM S PARAL, P222
[7]  
Duggan J., 2014, P 14 INT C EXT DAT T, P109
[8]   Recursive blocked algorithms and hybrid data structures for dense matrix library software [J].
Elmroth, E ;
Gustavson, F ;
Jonsson, I ;
Kågström, B .
SIAM REVIEW, 2004, 46 (01) :3-45
[9]   Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning [J].
Ganapathi, Archana ;
Kuno, Harumi ;
Dayal, Umeshwar ;
Wiener, Janet L. ;
Fox, Armando ;
Jordan, Michael ;
Patterson, David .
ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, :592-+
[10]  
Goodrich MT, 2011, LECT NOTES COMPUT SC, V7074, P374, DOI 10.1007/978-3-642-25591-5_39