mrMoulder: A recommendation-based adaptive parameter tuning approach for big data processing platform

被引:22
作者
Cai, Lin [1 ]
Qi, Yong [1 ]
Wei, Wei [2 ]
Wu, Jinsong [3 ]
Li, Jingwei [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Elect & Informat Engn, Xian 710049, Shaanxi, Peoples R China
[2] Xian Univ Technol, Sch Comp Sci & Engn, Xian 710048, Shaanxi, Peoples R China
[3] Univ Chile, Dept Elect Engn, Av Tupper 2007, Santiago 8370451, Chile
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2019年 / 93卷
基金
中国国家自然科学基金;
关键词
Big data processing; Performance optimization; Parameter tuning; Online configuration recommendation; Collaborative filtering; BENCHMARK SUITE; OPTIMIZATION;
D O I
10.1016/j.future.2018.05.080
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Nowadays the world has entered the big data era. Big data processing platforms, such as Hadoop and Spark, are increasingly adopted by many applications, in which there are numerous parameters that can be tuned to improve processing performance for big data platform operators. However, due to the large number of these parameters and the complex relationship among them, it is very time-consuming to manually tune parameters. Therefore, it is a challenge to automatically configure parameters as quickly as possible to optimize the performance of the current job. Existing auto-tuning methods often take a certain time before job runs to get the optimal configuration, which would increase the job's total processing time and reduce the overall efficiency of cluster. In this paper, we propose an adaptive tuning framework, mrMoulder, to recommend a near-optimal configuration for the new job in a short time. mrMoulder sets a self-extending configuration repository and a collaborative filtering based recommendation engine, to speed up the process of optimizing parameter configuration. We have deployed mrMoulder in a Hadoop cluster, and the experiment results have demonstrated that, for a new big data application, the recommend time of mrMoulder is only 20% to 30% of that for the existing auto-tuning methods, while the recommendation quality remains almost unchanged. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:570 / 582
页数:13
相关论文
共 37 条
[1]  
[Anonymous], 2005, DATA MINING PRACTICA
[2]  
[Anonymous], 2017, IEEE T INF FORENSICS
[3]  
[Anonymous], CS201105 DUK COMP SC
[4]  
Bell R. M., 2008, The bellkor 2008 solution to the netfix prize
[5]  
Fan X., 1939, IEEE T SERV COMPUT, P1
[6]  
Ferdman M., 2011, TECH REP
[7]  
Fu Z., 1939, IEEE T SERV COMPUT, P1
[8]   Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement [J].
Fu, Zhangjie ;
Ren, Kui ;
Shu, Jiangang ;
Sun, Xingming ;
Huang, Fengxiao .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (09) :2546-2559
[9]  
Gu B., 2016, IEEE Transactions on Neural Networks and Learning Systems, DOI DOI 10.1109/TNNLS.2016.2527796
[10]  
Heger Dominique., 2013, CMG Journal, V4, P97