Analysis of Big Data Platform with OpenStack and Hadoop

被引:5
作者
Li, Xiaoyan [1 ]
Lu, Zhihui [1 ]
Wang, Nini [2 ]
Wu, Jie [2 ]
Huang, Shalin [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] Minist Educ, Engn Res Ctr Cyber Secur Auditing & Monitoring, Shanghai 200433, Peoples R China
[3] Wangsu Sci & Technol Co Ltd, Shanghai 200433, Peoples R China
来源
ADVANCES IN SERVICES COMPUTING | 2016年 / 10065卷
关键词
Hadoop; Benchmarks; Big data; HDFS; Cluster; Openstack; Cloud; PERFORMANCE; MAPREDUCE;
D O I
10.1007/978-3-319-49178-3_29
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the era of big data, the cloud infrastructure needs to strongly support big data. As a distributed computational framework, Hadoop is one of the de facto leading software tools for solving big data problems. The cloud infrastructure has been proven to be a good support for three-tier architecture applications. In this paper, we construct a Hadoop big data platform based on OpenStack cloud. At the same time, we design three experimental scenarios, carry out a set of experiments using the standard Hadoop benchmarks TestDFSIO, TeraSort and PI, and examine the performance. Our experiments reveal that the disk read operation of physical servers can be a bottleneck for TestDFSIO and TeraSort. Wider allocation of VMs over physical servers achieves better performance for read jobs of TestDFSIO and TeraSort. For CPU-intensive job PI, the best practice is to centralize the allocation of VMs over physical machines.
引用
收藏
页码:375 / 390
页数:16
相关论文
共 16 条
[1]  
Aggarwal S., 2010, Proceedings of the 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom 2010), P748, DOI 10.1109/CloudCom.2010.20
[2]  
Bortnikov Edward., 2012, Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing. HotCloud'12, P18
[3]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[4]  
Ishii M., 2013, 2013 International Conference on Information Networking (ICOIN), P244, DOI 10.1109/ICOIN.2013.6496384
[5]  
Ko BM, 2014, INT CONF UTIL CLOUD, P481, DOI 10.1109/UCC.2014.61
[6]   Benchmarking a MapReduce Environment on a Full Virtualisation Platform [J].
Kontagora, Maryam ;
Gonzalez-Velez, Horacio .
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, :433-438
[7]  
Kotiyal B, 2013, 2013 INTERNATIONAL CONFERENCE ON HUMAN COMPUTER INTERACTIONS (ICHCI)
[8]   Performance Overhead Among Three Hypervisors: An Experimental Study using Hadoop Benchmarks [J].
Li, Jack ;
Wang, Qingyang ;
Jayasinghe, Deepal ;
Park, Junhee ;
Zhu, Tao ;
Pu, Calton .
2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, :9-16
[9]   From Databases to Big Data [J].
Madden, Sam .
IEEE INTERNET COMPUTING, 2012, 16 (03) :4-6
[10]  
Vasconcelos PRM, 2014, INT CONF INTERNET, P471, DOI 10.1109/ICITST.2014.7038858