An Enhanced Data-Locality-Aware Task Scheduling Algorithm for Hadoop Applications

被引:13
|
作者
Choi, Dongjoo [1 ]
Jeon, Myunghoon [1 ]
Kim, Namgi [1 ]
Lee, Byoung-Dai [1 ]
机构
[1] Kyonggi Univ, Comp Sci Dept, Suwon 443760, South Korea
来源
IEEE SYSTEMS JOURNAL | 2018年 / 12卷 / 04期
基金
新加坡国家研究基金会;
关键词
Data locality; Hadoop distributed file system (HDFS); MapReduce; task scheduling;
D O I
10.1109/JSYST.2017.2764481
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In general, Hadoop improves the task scheduling performance by determining data locality based on the location in which the input splits and MapTask are executed. However, if an input split consists of multiple data blocks that are distributed and stored in different nodes, this data location method fails to cope with the degradation in processing performance due to the increased frequency of data block copying. We propose a task scheduling algorithm that solves this issue by defining a method to classify data locality taking into account the location of all data blocks that comprise an input split, categorizing tasks based on the defined method, and sequentially assigning tasks according to a given priority. This study measures the performance of the proposed algorithm through a comparison of the total processing time, MapTask performance time, and data block copying frequency between the proposed algorithm and Hadoop's default task scheduling algorithm. The test results show that the proposed algorithm improved the total processing time by up to 25% and the data block copying frequency by up to 28%, when compared to the default algorithm.
引用
收藏
页码:3346 / 3357
页数:12
相关论文
共 50 条
  • [21] An Energy-aware Task Scheduling Algorithm for a Heterogeneous Data Center
    Zhang, Shuo
    Wang, Baosheng
    Zhao, Baokang
    Tao, Jing
    2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 1471 - 1477
  • [22] Profit-oriented task scheduling algorithm in Hadoop cluster
    Chai, Xu-qing
    Dong, Yong-liang
    Li, Jun-fei
    EURASIP JOURNAL ON EMBEDDED SYSTEMS, 2016,
  • [23] A strategy for scheduling reduce task based on intermediate data locality of the MapReduce
    Shang, Fengjun
    Chen, Xuanling
    Yan, Chenyun
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (04): : 2821 - 2831
  • [24] A strategy for scheduling reduce task based on intermediate data locality of the MapReduce
    Fengjun Shang
    Xuanling Chen
    Chenyun Yan
    Cluster Computing, 2017, 20 : 2821 - 2831
  • [25] CATS: cache-aware task scheduling for Hadoop-based systems
    Byungnam Lim
    Jong Wook Kim
    Yon Dohn Chung
    Cluster Computing, 2017, 20 : 3691 - 3705
  • [26] CATS: cache-aware task scheduling for Hadoop-based systems
    Lim, Byungnam
    Kim, Jong Wook
    Chung, Yon Dohn
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (04): : 3691 - 3705
  • [27] A Data Locality Optimization Algorithm for Large-scale Data Processing in Hadoop
    Zhao, Yanrong
    Wang, Weiping
    Meng, Dan
    Yang, Xiufeng
    Zhang, Shubin
    Li, Jun
    Guan, Gang
    2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2012, : 655 - 661
  • [28] Enhanced Memetic Algorithm for Task Scheduling
    Padmavathi, S.
    Shalinie, S. Mercy
    Someshwar, B. C.
    Sasikumar, T.
    SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, 2010, 6466 : 448 - +
  • [29] A Data Distribution Aware Task Scheduling Strategy for MapReduce System
    Guo, Leitao
    Sun, Hongwei
    Luo, Zhiguo
    CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 694 - 699
  • [30] Data Volume-aware Computation Task Scheduling for Smart Grid Data Analytic Applications
    Guo, Binquan
    Li, Hongyan
    Yan, Ye
    Zhang, Zhou
    Wang, Peng
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 4113 - 4118