SimCost: cost-effective resource provision prediction and recommendation for spark workloads

被引:3
作者
Chen, Yuxing [1 ]
Hoque, Mohammad A. [1 ]
Xu, Pengfei [1 ]
Lu, Jiaheng [1 ]
Tarkoma, Sasu [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
基金
芬兰科学院;
关键词
Parameter tuning; Cost modeling; Spark; Resource provisioning; MAPREDUCE; OPTIMIZATION;
D O I
10.1007/s10619-023-07436-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spark is one of the most popular big data analytical platforms. To save time, achieve high resource utilization, and remain cost-effective for Spark jobs, it is challenging but imperative for data scientists to configure suitable resource portions.In this paper, we investigate the proper parameter values that meet workloads' performance requirements with minimized resource cost and resource utilization time. We propose SimCost, a simulation-based cost model, to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. Our method's salient feature is that it allows us to invest low training costs while obtaining an accurate prediction. Through empirical experiments with 12 benchmark workloads, we show that the cost model yields less than 5% error on average prediction accuracy, and the recommendation achieves up to 6x resource cost saving.
引用
收藏
页码:73 / 102
页数:30
相关论文
共 63 条
[1]  
apache, APACHE SPARK REST AP
[2]  
Awan A.J., 2016, ARXIV
[3]  
Bao L, 2018, IEEE INT CONF BIG DA, P181, DOI 10.1109/BigData.2018.8622018
[4]  
Binder K., 2009, MONTE CARLO SIMULATI, P5667
[5]   Using machine learning to optimize parallelism in big data applications [J].
Brandon Hernandez, Alvaro ;
Perez, Maria S. ;
Gupta, Smrati ;
Muntes-Mulero, Victor .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 :1076-1092
[6]   Continuous Cloud-Scale Query Optimization and Processing [J].
Bruno, Nicolas ;
Jain, Sapna ;
Zhou, Jingren .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11) :961-972
[7]   Optimization of Resource Provisioning Cost in Cloud Computing [J].
Chaisiri, Sivadon ;
Lee, Bu-Sung ;
Niyato, Dusit .
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2012, 5 (02) :164-177
[8]   CRESP: Towards Optimal Resource Provisioning for MapReduce Computing in Public Clouds [J].
Chen, Keke ;
Powers, James ;
Guo, Shumin ;
Tian, Fengguang .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (06) :1403-1412
[9]  
Chen Y., 2021, PERFORMANCE TUNING Q
[10]   d-Simplexed: Adaptive Delaunay Triangulation or Performance Modeling and Prediction on Big Data Analytics [J].
Chen, Yuxing ;
Goetsch, Peter ;
Hoque, Mohammad A. ;
Lu, Jiaheng ;
Tarkoma, Sasu .
IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (02) :458-469