Hadoop Performance Self-Tuning Using a Fuzzy-Prediction Approach

被引:8
作者
Lee, Gil Jae [1 ]
Fortes, Jose A. B. [1 ]
机构
[1] Univ Florida, Gainesville, FL 32611 USA
来源
2016 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING (ICAC) | 2016年
关键词
Performance tuning; Apache Hadoop; YARN; Self-tuning; Autonomic computing; Fuzzy prediction;
D O I
10.1109/ICAC.2016.52
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Apache Hadoop framework (currently known as YARN) is a widely used open-source implementation of MapReduce (MR). Manual tuning of Hadoop performance is hard and time-consuming so several self-tuning approaches have been proposed. This paper proposes an approach that avoids problems of previous self-tuning approaches based on performance models or resource usage, namely 1) need for a time-consuming training phase, typically offline, 2) unsuitability for Hadoop environments with concurrently running MR jobs, and 3) need for modification of the Hadoop framework itself. The proposed approach uses a fuzzy-prediction controller for self-optimization of the number of concurrent MR jobs. The fuzzy-prediction controller learns from past and current resource usage of MR jobs and from the number of concurrent tasks. It both uses and constructs rules in real time to predict the resource usage and the number of concurrent tasks. It does not require offline training or any modification of either the MR jobs or the Hadoop framework. The predicted values are used to dynamically control the number of concurrent ApplicationMasters (AMs) (i.e., MR jobs in RUNNING state). Experimental evaluation of the proposed approach on a 7-node cluster (1 master node and 6 slave nodes) running 30-job sequences combining three different types of MR jobs (Terasort, Grep and Wordcount) showed up to 29% performance improvement over Hadoop default configurations. The new approach improves the aggregate performance of MR jobs by adjusting a single YARN parameter.
引用
收藏
页码:55 / 64
页数:10
相关论文
共 17 条
  • [1] [Anonymous], ARXIV13041467
  • [2] [Anonymous], 2010, P 19 ACM INT S HIGH, DOI DOI 10.1145/1851476.1851593
  • [3] [Anonymous], 2012, Hadoop: The definitive guide
  • [4] RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration
    Bei, Zhendong
    Yu, Zhibin
    Zhang, Huiling
    Xiong, Wen
    Xu, Chengzhong
    Eeckhout, Lieven
    Feng, Shengzhong
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (05) : 1470 - 1483
  • [5] BRADSKI G., 2007, NIPS, P281
  • [6] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [7] Herodotou H., 2011, CIDR, V11, P261
  • [8] Automatic Optimization for MapReduce Programs
    Jahani, Eaman
    Cafarella, Michael J.
    Re, Christopher
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (06): : 385 - 396
  • [9] Dynamically controlling node-level parallelism in Hadoop
    Kc, Kamal
    Freeh, Vincent W.
    [J]. 2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 309 - 316
  • [10] Li M, 2014, PROCEEDINGS OF THE 10TH EURO-ASIA CONFERENCE ON ENVIRONMENT AND CORPORATE SOCIAL RESPONSIBILITY: TOURISM, SOCIETY AND EDUCATION SESSION, PT III, P165