An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

被引:0
作者
Zhao, Hui [1 ]
Yang, Shuqiang [2 ]
Fan, Hua [1 ]
Chen, Zhikun [1 ]
Xu, Jinghu [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China
[2] Natl Univ Def Technol, Changsha, Hunan, Peoples R China
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2013年 / E96D卷 / 12期
基金
中国国家自然科学基金;
关键词
data-intensive computation; MapReduce; Hadoop; algorithm design; scheduling; grid computing; data locality; cloud computing; flowtime;
D O I
10.1587/transinf.E96.D.2654
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scheduling plays a key role in Map Reduce systems. In this paper, we explore the efficiency of an Map Reduce cluster running lots of independent and continuously arriving Map Reduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in Map Reduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for Map Reduce environment, there are some in-used schedulers for the popular open-source Hadoop Map Reduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total fiowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed sdheduling strategy.
引用
收藏
页码:2654 / 2662
页数:9
相关论文
共 50 条
  • [41] Security-driven scheduling for data-intensive applications on grids
    Tao Xie
    Xiao Qin
    Cluster Computing, 2007, 10 (2) : 145 - 153
  • [42] Security-driven on grids scheduling for data-intensive applications
    Tao Xie
    Xiao Qin
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2007, 10 (02): : 145 - 153
  • [43] Data-Intensive Task Scheduling for Heterogeneous Big Data Analytics in IoT System
    Li, Xin
    Wang, Liangyuan
    Abawajy, Jemal H.
    Qin, Xiaolin
    Pau, Giovanni
    You, Ilsun
    ENERGIES, 2020, 13 (17)
  • [44] QoS-Aware, Cost-Efficient Scheduling for Data-Intensive DAGs in Multi-Tier Computing Environment
    Kayal, Paridhika
    Leon-Garcia, Alberto
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2024, 12 (04) : 1314 - 1327
  • [45] DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems
    Wang, Tao
    Yao, Shihong
    Xu, Zhengquan
    Jia, Shan
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (07) : 2537 - 2564
  • [46] DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems
    Tao Wang
    Shihong Yao
    Zhengquan Xu
    Shan Jia
    The Journal of Supercomputing, 2016, 72 : 2537 - 2564
  • [47] QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing Systems
    Lin, Jenn-Wei
    Chen, Chien-Hung
    Chang, J. Morris
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2013, 1 (01) : 101 - 115
  • [48] SR-PSO: server residual efficiency-aware particle swarm optimization for dynamic virtual machine scheduling
    Kashav Ajmera
    Tribhuwan Kumar Tewari
    The Journal of Supercomputing, 2023, 79 : 15459 - 15495
  • [49] Adaptive divisible load model for scheduling data-intensive grid applications
    Othman, M.
    Abdullah, M.
    Ibrahim, H.
    Subramaniam, S.
    COMPUTATIONAL SCIENCE - ICCS 2007, PT 1, PROCEEDINGS, 2007, 4487 : 446 - +
  • [50] HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters
    Sun, Mingming
    Zhuang, Hang
    Zhou, Xuehai
    Lu, Kun
    Li, Changlong
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II, 2014, 8631 : 82 - 95