An Enhanced Data-Locality-Aware Task Scheduling Algorithm for Hadoop Applications

被引:13
|
作者
Choi, Dongjoo [1 ]
Jeon, Myunghoon [1 ]
Kim, Namgi [1 ]
Lee, Byoung-Dai [1 ]
机构
[1] Kyonggi Univ, Comp Sci Dept, Suwon 443760, South Korea
来源
IEEE SYSTEMS JOURNAL | 2018年 / 12卷 / 04期
基金
新加坡国家研究基金会;
关键词
Data locality; Hadoop distributed file system (HDFS); MapReduce; task scheduling;
D O I
10.1109/JSYST.2017.2764481
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In general, Hadoop improves the task scheduling performance by determining data locality based on the location in which the input splits and MapTask are executed. However, if an input split consists of multiple data blocks that are distributed and stored in different nodes, this data location method fails to cope with the degradation in processing performance due to the increased frequency of data block copying. We propose a task scheduling algorithm that solves this issue by defining a method to classify data locality taking into account the location of all data blocks that comprise an input split, categorizing tasks based on the defined method, and sequentially assigning tasks according to a given priority. This study measures the performance of the proposed algorithm through a comparison of the total processing time, MapTask performance time, and data block copying frequency between the proposed algorithm and Hadoop's default task scheduling algorithm. The test results show that the proposed algorithm improved the total processing time by up to 25% and the data block copying frequency by up to 28%, when compared to the default algorithm.
引用
收藏
页码:3346 / 3357
页数:12
相关论文
共 50 条
  • [1] An improved task scheduling algorithm based on cache locality and data locality in Hadoop
    Zhang, Peng
    Li, Chunlin
    Zhao, Yahui
    2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 244 - 249
  • [2] A data-locality-aware task scheduler for distributed social graph queries
    Jin, Jiahui
    Luo, Junzhou
    Du, Mingyang
    Dang, Yongcheng
    Li, Feng
    Zhang, Jinghui
    Song, Aibo
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 93 : 1010 - 1022
  • [3] Data-locality-aware mapreduce real-time scheduling framework
    Kao, Yu-Chon
    Chen, Ya-Shu
    JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 112 : 65 - 77
  • [4] An Optimal Locality-Aware Task Scheduling Algorithm Based on Bipartite Graph Modelling for Spark Applications
    Fu, Zhongming
    Tang, Zhuo
    Yang, Li
    Liu, Chubo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (10) : 2406 - 2420
  • [5] DynDL: Scheduling Data-Locality-Aware Tasks with Dynamic Data Transfer Cost for Multicore-Server-Based Big Data Clusters
    Jin, Jiahui
    An, Qi
    Zhou, Wei
    Tang, Jiakai
    Xiong, Runqun
    APPLIED SCIENCES-BASEL, 2018, 8 (11):
  • [6] RTSBL: Reduce Task Scheduling Based on the Load Balancing and the Data Locality in Hadoop
    Midoun, Khadidja
    Hidouci, Walid-Khaled
    Loudini, Malik
    Belayadi, Djahida
    ADVANCES IN COMPUTING SYSTEMS AND APPLICATIONS, 2019, 50 : 271 - 280
  • [7] BOLAS plus : Scalable Lightweight Locality-aware Scheduling for Hadoop
    Gao, Shengli
    Xue, Ruini
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1077 - 1084
  • [8] LaSA: A Locality-aware Scheduling Algorithm for Hadoop-MapReduce Resource Assignment
    Chen, Tseng-Yi
    Wei, Hsin-Wen
    Wei, Ming-Feng
    Chen, Ying-Jie
    Hsu, Tsan-Sheng
    Shih, Wei-Kuan
    PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2013, : 342 - 346
  • [9] A Task Scheduling Algorithm for Hadoop Platform
    Chen, Jilan
    Wang, Dan
    Zhao, Wenbing
    JOURNAL OF COMPUTERS, 2013, 8 (04) : 929 - 936
  • [10] A virtual machine based task scheduling approach to improving data locality for virtualized Hadoop
    Sun, Ruiqi
    Yang, Jie
    Gao, Zhan
    He, Zhiqiang
    2014 IEEE/ACIS 13TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2014, : 291 - 296