Hadoop Characterization

被引:5
作者
Alzuru, Icaro [1 ]
Long, Kevin [2 ]
Li, Tao [3 ]
Zimmerman, David [2 ]
Gowda, Bhaskar [4 ]
机构
[1] Univ Florida, CISE, Gainesville, FL 32611 USA
[2] Intel Corp, Folsom, CA USA
[3] Univ Florida, ECE, Gainesville, FL USA
[4] Intel Corp, Hillsboro, OR 97124 USA
来源
2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2 | 2015年
关键词
hadoop; big data; characterization; power consumption; workloads; benchmarks; hibench; big-bench;
D O I
10.1109/Trustcom.2015.567
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the last decade, Warehouse Scale Computers (WSC) have grown in number and capacity while Hadoop became the de facto standard framework for Big data processing. Despite the existence of several benchmark suites, sizing guides, and characterization studies, there are few concrete guidelines for WSC designers and engineers who need to know how real Hadoop workloads are going to stress the different hardware subsystems of their servers. Available studies have shown execution statistics of Hadoop benchmarks but have not being able to extract meaningful and reusable results. Secondly, existing sizing guides provide hardware acquisition lists without considering the workloads. In this study, we propose a simple Big data workload differentiation, deliver general and specific conclusions about how demanding the different types of Hadoop workloads are for several hardware subsystems, and show how power consumption is influenced in each case. HiBench and Big-Bench suites were used to capture real time memory traces, and CPU, disk, and power consumption statistics of Hadoop. Our results show that CPU intensive and disk intensive workloads have a different behavior. CPU intensive workloads consume more power and memory bandwidth while disk intensive workloads usually require more memory. These and other conclusions presented in the paper are expected to help WSC designers to decide the hardware characteristics of their Hadoop systems, and better understand the behavior of big data workloads in Hadoop.
引用
收藏
页码:96 / 103
页数:8
相关论文
共 23 条
  • [1] [Anonymous], IEEE INT C ON SPEECH
  • [2] The Internet of Things: A survey
    Atzori, Luigi
    Iera, Antonio
    Morabito, Giacomo
    [J]. COMPUTER NETWORKS, 2010, 54 (15) : 2787 - 2805
  • [3] Barroso L. A., 2009, SYNTHESIS LECT COMPU
  • [4] Big Data - Opportunities and Challenges
    Bertino, Elisa
    [J]. 2013 IEEE 37TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2013, : 479 - 480
  • [5] Chowdhury B., 2014, BIGBENCH IMPLEMENTAT, P3
  • [6] Dimitrov Martin, 2013, 2013 IEEE International Conference on Big Data, P15, DOI 10.1109/BigData.2013.6691693
  • [7] Ghazal Ahmad, 2013, P 2013 ACM SIGMOD IN, P1197
  • [8] Gregg B., 2016, IOSNOOP FOR LINUX
  • [9] Hennessy John L., 2012, COMPUTER ARCHITECTUR, V5th
  • [10] Hortonworks Inc., 2013, CLUST PLANN GUID