Comparison and Improvement of Hadoop MapReduce Performance Prediction Models in the Private Cloud

被引:5
作者
Wang, Nini [1 ]
Yang, Jian [1 ]
Lu, Zhihui [1 ]
Li, Xiaoyan [1 ]
Wu, Jie [2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] Auditing & Monitoring Minist Educ, Engn Res Ctr Cyber Secur, Shanghai 200433, Peoples R China
来源
ADVANCES IN SERVICES COMPUTING | 2016年 / 10065卷
关键词
Big data; Hadoop; Private cloud; Mapreduce; Performance prediction model; Job estimation;
D O I
10.1007/978-3-319-49178-3_6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Performance modeling for MapReduce applications with large-scale data is a very important issue in the study of optimization, evaluation, prediction and resource scheduling of the jobs over big data and cloud computing platforms. In this paper, we study the Hadoop distributed computing framework, which is the current trend of Big Data solutions. We use the locally weighted linear regression (LWLR) algorithm and linear regression (LR) algorithm to establish three kinds of computing models based on different characteristics to estimate the execution time of the applications that have large-scale data and run on the Hadoop framework, and at the same time we make comparison and improvement to the three models. By building different types of experimental environments, and running different types of jobs, we can draw a conclusion that all the three models have very good results in predicting the execution time and evaluating the performance of large-scale data applications with small-scale data.
引用
收藏
页码:77 / 91
页数:15
相关论文
共 14 条
[1]  
Carrera I., 2014, PERFORMANCE MODELING
[2]   CRESP: Towards Optimal Resource Provisioning for MapReduce Computing in Public Clouds [J].
Chen, Keke ;
Powers, James ;
Guo, Shumin ;
Tian, Fengguang .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (06) :1403-1412
[3]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[4]  
Fengguang Tian, 2011, Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing (CLOUD 2011), P155, DOI 10.1109/CLOUD.2011.14
[5]  
Herodotos Herodotou, 2011, ARXIV11060940
[6]   Hadoop Performance Modeling for Job Estimation and Resource Provisioning [J].
Khan, Mukhtaj ;
Jin, Yong ;
Li, Maozhen ;
Xiang, Yang ;
Jiang, Changjun .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (02) :441-454
[7]   MULTIVARIATE LOCALLY WEIGHTED LEAST-SQUARES REGRESSION [J].
RUPPERT, D ;
WAND, MP .
ANNALS OF STATISTICS, 1994, 22 (03) :1346-1370
[8]  
Snijders C, 2012, INT J INTERNET SCI, V7, P1
[9]  
Song G., 2013, HIGH PERFORMANCE COM
[10]  
Verma A, 2011, P 8 ACM INT C AUT CO, P235, DOI [DOI 10.1145/1998582.1998637, 10.1145/1998582.1998637]