RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration

被引:73
作者
Bei, Zhendong [1 ]
Yu, Zhibin [1 ]
Zhang, Huiling [1 ]
Xiong, Wen [1 ]
Xu, Chengzhong [1 ]
Eeckhout, Lieven [2 ]
Feng, Shengzhong [1 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Ctr Cloud Comp, Shenzhen 518055, Peoples R China
[2] Univ Ghent, B-9000 Ghent, Belgium
基金
欧洲研究理事会;
关键词
Performance tuning; MapReduce/Hadoop; system configuration; random forest; genetic algorithm;
D O I
10.1109/TPDS.2015.2449299
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hadoop is a widely-used implementation framework of the MapReduce programming model for large-scale data processing. Hadoop performance however is significantly affected by the settings of the Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-consuming, if at all practical. This paper proposes an approach, called RFHOC, to automatically tune the Hadoop configuration parameters for optimized performance for a given application running on a given cluster. RFHOC constructs two ensembles of performance models using a random-forest approach for the map and reduce stage respectively. Leveraging these models, RFHOC employs a genetic algorithm to automatically search the Hadoop configuration space. The evaluation of RFHOC using five typical Hadoop programs, each with five different input data sets, shows that it achieves a performance speedup by a factor of 2.11x on average and up to 7.4x over the recently proposed cost-based optimization (CBO) approach. In addition, RFHOC's performance benefit increases with input data set size.
引用
收藏
页码:1470 / 1483
页数:14
相关论文
共 28 条
[1]  
Ahmad F., 2012, 4372012 PURD U
[2]  
[Anonymous], 1993, An introduction to the bootstrap
[3]  
[Anonymous], 2008, DYNAMIC INSTRUMENTAT
[4]  
[Anonymous], 2009, P WORKSH HOT TOP CLO
[5]  
Beyer KS, 2011, PROC VLDB ENDOW, V4, P1272
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Bu X., 2013, P 22 INT S HIGH PERF, P227
[9]  
Corcoran A. L., 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence (Cat. No.94TH0650-2), P340, DOI 10.1109/ICEC.1994.349928
[10]  
Herodotou H., 2011, CIDR, V11, P261