A Pareto Framework for Data Analytics on Heterogeneous Systems: Implications for Green Energy Usage and Performance

被引:10
作者
Chakrabarti, Aniket [1 ]
Parthasarathy, Srinivasan [1 ]
Stewart, Christopher [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
来源
2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP) | 2017年
关键词
Large Scale Analytics Framework; Pareto Frontier;
D O I
10.1109/ICPP.2017.62
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed algorithms for data analytics partition their input data across many machines for parallel execution. At scale, it is likely that some machines will perform worse than others because they are slower, power constrained or dependent on undesirable, dirty energy sources. It is challenging to balance analytics workloads across heterogeneous machines because the algorithms are sensitive to statistical skew in data partitions. A skewed partition can slow down the whole workload or degrade the quality of results. Sizing partitions in proportion to each machine's performance may introduce or further exacerbate skew. In this paper, we propose a scheme that controls the statistical distribution of each partition and sizes partitions according to the heterogeneity of the computing environment. We model heterogeneity as a multi-objective optimization, with the objectives being functions for execution time and dirty energy consumption. We use stratification to control skew. Experiments show that our computational heterogeneity-aware (Het-Aware) partitioning strategy speeds up running time by up to 51% over the stratified partitioning scheme baseline. We also have a heterogeneity and energy aware (Het-Energy-Aware) partitioning scheme which is slower than the Het-Aware solution but can lower the dirty energy footprint by up to 26%. For some analytic tasks, there is also a significant qualitative benefit when using such partitioning strategies.
引用
收藏
页码:533 / 542
页数:10
相关论文
共 35 条
[1]  
Agrawal R., P 20 INT C VERY LARG
[2]  
[Anonymous], 2013, 10 INT C AUTONOMIC C
[3]  
[Anonymous], 1991, Game Theory
[4]  
[Anonymous], 1995, EFFICIENT ALGORITHM
[5]  
[Anonymous], 1979, MULTIPLE OBJECTIVE D
[6]   A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems [J].
Beloglazov, Anton ;
Buyya, Rajkumar ;
Lee, Young Choon ;
Zomaya, Albert .
ADVANCES IN COMPUTERS, VOL 82, 2011, 82 :47-111
[7]   Scheduling multithreaded computations by work stealing [J].
Blumofe, RD ;
Leiserson, CE .
JOURNAL OF THE ACM, 1999, 46 (05) :720-748
[8]  
Bohman Tom., 2000, the electronic journal of combinatorics, V7, pR26
[9]  
Boldi P., 2004, P 13 INT C WORLD WID, P595, DOI DOI 10.1145/988672.988752
[10]  
Broder A. Z., 1998, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, P327, DOI 10.1145/276698.276781