TurBO: A cost-efficient configuration-based auto-tuning approach for cluster-based big data frameworks

被引:3
作者
Dou, Hui [1 ]
Zhang, Lei [1 ]
Zhang, Yiwen [1 ]
Chen, Pengfei [2 ]
Zheng, Zibin [2 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China
[2] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Big data framework; Configuration parameter; Tuning cost; Bayesian optimization; Pseudo point;
D O I
10.1016/j.jpdc.2023.03.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big data processing frameworks such as Spark usually provide a large number of performance-related configuration parameters, how to auto-tune these parameters for a better performance has been a hot issue in academia as well as industry for years. Through delicately tradeoff between exploration and exploitation, Bayesian Optimization (BO) is currently the most appealing algorithm to achieve configuration auto-tuning. However, considering the tuning cost constraint in practice, there are three critical limitations preventing conventional BO-based approaches from being directly applied into auto -tuning cluster-based big data frameworks. In this paper, we propose a cost-efficient configuration auto-tuning approach named TurBO for big data frameworks based on two enhancements of vanilla BO:1) To reduce the essential iteration times, TurBO integrates a well-designed adaptive pseudo point mechanism with BO; 2) To avoid the time-consuming practical evaluation of sub-optimal configurations as possible, TurBO leverages the proposed CASampling method to intelligently tackle with these sub-optimal configurations based on ensemble learning with historical tuning experiences. To evaluate the performance of TurBO, we conducted a series of experiments on a local Spark cluster with 9 different HiBench benchmark applications. Overall, compared with 3 representative BO-based baseline approaches OpenTuner, Bliss and ResTune, TurBO is able to speedup the tuning procedures respectively by 2.24x, 2.29x and 1.97x on average. Besides, TurBO can always achieve a positive cumulative performance gain under the simulated dynamic workload scenario, which means TurBO is indeed appropriate for workload changes of big data applications.(c) 2023 Elsevier Inc. All rights reserved.
引用
收藏
页码:89 / 105
页数:17
相关论文
共 45 条
[1]  
Alipourfard O, 2017, PROCEEDINGS OF NSDI '17: 14TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, P469
[2]  
[Anonymous], 2016, Apache Hadoop
[3]  
[Anonymous], 2019, Apache Spark
[4]   OpenTuner: An Extensible Framework for Program Autotuning [J].
Ansel, Jason ;
Kamil, Shoaib ;
Veeramachaneni, Kalyan ;
Ragan-Kelley, Jonathan ;
Bosboom, Jeffrey ;
O'Reilly, Una-May ;
Amarasinghe, Saman .
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, :303-315
[5]  
ansible, ANSIBLE PLAYBOOK
[6]  
apache, SPARK CONFIGURATION
[7]  
Apache, APACHE FLINK
[8]   AutoConfig: Automatic Configuration Tuning for Distributed Message Systems [J].
Bao, Liang ;
Liu, Xin ;
Xu, Ziheng ;
Fang, Baoyin .
PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, :29-40
[9]   ACTGAN: Automatic Configuration Tuning for Software Systems with Generative Adversarial Networks [J].
Bao, Liang ;
Liu, Xin ;
Wang, Fangzheng ;
Fang, Baoyin .
34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, :465-476
[10]  
Bei Zhendong, 2021, IEEE T COMPUT