You Only Run Once: Spark Auto-Tuning From a Single Run

被引:16
作者
Prats, David Buchaca [1 ]
Portella, Felipe Albuquerque [2 ,3 ]
Costa, Carlos H. A. [4 ]
Berral, Josep Lluis [1 ]
机构
[1] Barcelona Supercomp Ctr, Data Centr Comp, Barcelona 08034, Spain
[2] Univ Politecn Catalunya UPC BarcelonaTECH, Informat Technol Dept, Barcelona 08034, Spain
[3] Petr Brasileiro SA PETROBRAS, Informat Technol Dept, BR-20031912 Rio De Janeiro, Brazil
[4] IBM TJ Watson Res Ctr, Dept HPC, Yorktown Hts, NY 10598 USA
来源
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT | 2020年 / 17卷 / 04期
基金
欧洲研究理事会;
关键词
Sparks; Optimization; Predictive models; Machine learning; Bayes methods; Measurement; Standards; Decision making for workload auto-tuning; machine learning; spark auto-tuning; workload modeling; workload placement;
D O I
10.1109/TNSM.2020.3034824
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tuning configurations of Spark jobs is not a trivial task. State-of-the-art auto-tuning systems are based on iteratively running workloads with different configurations. During the optimization process, the relevant features are explored to find good solutions. Many optimizers enhance the time-to-solution using black-box optimization algorithms that do not take into account any information from the Spark workloads. In this article, we present a new method for tuning configurations that uses information from one run of a Spark workload. To achieve good performance, we mine the SparkEventLog that is generated by the Spark engine. This log file contains a large amount of information from the executed application. We use this information to enhance a performance model with low-level features from the workload to be optimized. These features include Spark Actions, Transformations, and Task metrics. This process allows us to obtain application-specific workload information. With this information our system can predict sensible Spark configurations for unseen jobs, given that it has been trained with reasonable coverage of Spark applications. Experiments show that the presented system correctly produces good configurations, while achieving up to 80% speedup with respect to the default Spark configuration, and up to 12x speedup of the time-to-solution with respect to a standard Bayesian Optimization procedure.
引用
收藏
页码:2039 / 2051
页数:13
相关论文
共 30 条
[1]  
[Anonymous], 2020, SPARK CONFIGURATION
[2]  
[Anonymous], 2020, TUNING SPARK 3 0 0
[3]   OpenTuner: An Extensible Framework for Program Autotuning [J].
Ansel, Jason ;
Kamil, Shoaib ;
Veeramachaneni, Kalyan ;
Ragan-Kelley, Jonathan ;
Bosboom, Jeffrey ;
O'Reilly, Una-May ;
Amarasinghe, Saman .
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, :303-315
[4]  
Bao L, 2019, PROCEEDINGS OF THE ASME 14TH INTERNATIONAL MANUFACTURING SCIENCE AND ENGINEERING CONFERENCE, 2019, VOL 1
[5]   RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration [J].
Bei, Zhendong ;
Yu, Zhibin ;
Zhang, Huiling ;
Xiong, Wen ;
Xu, Chengzhong ;
Eeckhout, Lieven ;
Feng, Shengzhong .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (05) :1470-1483
[6]   ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments [J].
Berral, Josep Ll. ;
Poggi, Nicolas ;
Carrera, David ;
Call, Aaron ;
Reinauer, Rob ;
Green, Daron .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :1701-1710
[7]   Verifying big data topologies by-design: a semi-automated approach [J].
Bersani, Marcello M. ;
Marconi, Francesco ;
Tamburri, Damian A. ;
Nodari, Andrea ;
Jamshidi, Pooyan .
JOURNAL OF BIG DATA, 2019, 6 (01)
[8]   Using machine learning to optimize parallelism in big data applications [J].
Brandon Hernandez, Alvaro ;
Perez, Maria S. ;
Gupta, Smrati ;
Muntes-Mulero, Victor .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 :1076-1092
[9]   d-Simplexed: Adaptive Delaunay Triangulation or Performance Modeling and Prediction on Big Data Analytics [J].
Chen, Yuxing ;
Goetsch, Peter ;
Hoque, Mohammad A. ;
Lu, Jiaheng ;
Tarkoma, Sasu .
IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (02) :458-469
[10]  
Fekry A, 2020, ARXIV