An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

被引:0
|
作者
Zhao, Hui [1 ]
Yang, Shuqiang [2 ]
Fan, Hua [1 ]
Chen, Zhikun [1 ]
Xu, Jinghu [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China
[2] Natl Univ Def Technol, Changsha, Hunan, Peoples R China
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2013年 / E96D卷 / 12期
基金
中国国家自然科学基金;
关键词
data-intensive computation; MapReduce; Hadoop; algorithm design; scheduling; grid computing; data locality; cloud computing; flowtime;
D O I
10.1587/transinf.E96.D.2654
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scheduling plays a key role in Map Reduce systems. In this paper, we explore the efficiency of an Map Reduce cluster running lots of independent and continuously arriving Map Reduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in Map Reduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for Map Reduce environment, there are some in-used schedulers for the popular open-source Hadoop Map Reduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total fiowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed sdheduling strategy.
引用
收藏
页码:2654 / 2662
页数:9
相关论文
共 50 条
  • [21] A Customizable MapReduce Framework for Complex Data-Intensive Workflows on GPUs
    Qiao, Zhi
    Liang, Shuwen
    Jiang, Hai
    Fu, Song
    2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
  • [22] MapReduce in the Cloud: Data-Location-Aware VM Scheduling
    Tung Nguyen
    Weisong Shi
    ZTECommunications, 2013, 11 (04) : 18 - 26
  • [23] Thermal-aware Job Scheduling of MapReduce Applications on High Performance Clusters
    Taneja, Shubbhi
    Zhou, Yi
    Alghamdi, Mohammed I.
    Qin, Xiao
    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW), 2017, : 261 - 270
  • [24] Simultaneous scheduling of replication and computation for data-intensive applications on the grid
    Desprez F.
    Vernois A.
    Journal of Grid Computing, 2006, 4 (1) : 19 - 31
  • [25] A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computing
    Rao, Bingbing
    Wang, Liqang
    2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 81 - 88
  • [26] Design of Self-Adjusting algorithm for data-intensive MapReduce Applications
    Nagiwale, Amin Nazir
    Umale, Manish R.
    Sinha, Aditya Kumar
    2015 INTERNATIONAL CONFERENCE ON ENERGY SYSTEMS AND APPLICATIONS, 2015, : 506 - 510
  • [27] Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications
    Mashayekhy, Lena
    Nejad, Mahyar Movahed
    Grosu, Daniel
    Zhang, Quan
    Shi, Weisong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (10) : 2720 - 2733
  • [28] Service Placement and Request Scheduling for Data-Intensive Applications in Edge Clouds
    Farhadi, Vajiheh
    Mehmeti, Fidan
    He, Ting
    La Porta, Thomas F.
    Khamfroush, Hana
    Wang, Shiqiang
    Chan, Kevin S.
    Poularakis, Konstantinos
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2021, 29 (02) : 779 - 792
  • [29] PADS: Performance-Aware Dynamic Scheduling for effective MapReduce Computation in Heterogeneous Clusters Poster extended abstract
    Hamandawana, Prince
    Mativenga, Ronnie
    Kwon, Se Jin
    Chung, Tae-Sun
    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 160 - 161
  • [30] Deadline based scheduling for data-intensive applications in clouds
    Fu Xiong
    Cang Yeliang
    Zhu Lipeng
    Hu Bin
    Deng Song
    Wang Dong
    The Journal of China Universities of Posts and Telecommunications, 2016, (06) : 8 - 15