An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

被引:0
作者
Zhao, Hui [1 ]
Yang, Shuqiang [2 ]
Fan, Hua [1 ]
Chen, Zhikun [1 ]
Xu, Jinghu [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China
[2] Natl Univ Def Technol, Changsha, Hunan, Peoples R China
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2013年 / E96D卷 / 12期
基金
中国国家自然科学基金;
关键词
data-intensive computation; MapReduce; Hadoop; algorithm design; scheduling; grid computing; data locality; cloud computing; flowtime;
D O I
10.1587/transinf.E96.D.2654
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scheduling plays a key role in Map Reduce systems. In this paper, we explore the efficiency of an Map Reduce cluster running lots of independent and continuously arriving Map Reduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in Map Reduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for Map Reduce environment, there are some in-used schedulers for the popular open-source Hadoop Map Reduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total fiowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed sdheduling strategy.
引用
收藏
页码:2654 / 2662
页数:9
相关论文
共 50 条
  • [31] Algorithms for Divisible Load Scheduling of Data-intensive Applications
    Chen Yu
    Dan C. Marinescu
    Journal of Grid Computing, 2010, 8 : 133 - 155
  • [32] Deadline based scheduling for data-intensive applications in clouds
    Fu Xiong
    Cang Yeliang
    Zhu Lipeng
    Hu Bin
    Deng Song
    Wang Dong
    The Journal of China Universities of Posts and Telecommunications, 2016, (06) : 8 - 15
  • [33] Algorithms for Divisible Load Scheduling of Data-intensive Applications
    Yu, Chen
    Marinescu, Dan C.
    JOURNAL OF GRID COMPUTING, 2010, 8 (01) : 133 - 155
  • [34] Network-aware scheduling for real-time execution support in data-intensive optical Grids
    Palmieri, Francesco
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2009, 25 (07): : 794 - 803
  • [35] Load Balanced and Energy Aware Cloud Resource Scheduling Design for Executing Data-intensive Application in SDVC
    Shalini, S.
    Patil, Annapurna P.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (10) : 368 - 374
  • [36] Utilizing (and Designing) Modern Hardware for Data-Intensive Computations: The Role of Abstraction
    Ross, Kenneth A.
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1 - 1
  • [37] A capabilities-aware framework for using computational accelerators in data-intensive computing
    Rafique, M. Mustafa
    Butt, Ali R.
    Nikolopoulos, Dimitrios S.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2011, 71 (02) : 185 - 197
  • [38] WaaS: Workflow-as-a-Service for the Cloud with Scheduling of Continuous and Data-Intensive Workflows
    Esteves, Sergio
    Veiga, Luis
    COMPUTER JOURNAL, 2016, 59 (03) : 371 - 383
  • [39] A Data Distribution Aware Task Scheduling Strategy for MapReduce System
    Guo, Leitao
    Sun, Hongwei
    Luo, Zhiguo
    CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 694 - 699
  • [40] Security-driven scheduling for data-intensive applications on grids
    Tao Xie
    Xiao Qin
    Cluster Computing, 2007, 10 (2) : 145 - 153