An Enhanced Data-Locality-Aware Task Scheduling Algorithm for Hadoop Applications

被引:13
|
作者
Choi, Dongjoo [1 ]
Jeon, Myunghoon [1 ]
Kim, Namgi [1 ]
Lee, Byoung-Dai [1 ]
机构
[1] Kyonggi Univ, Comp Sci Dept, Suwon 443760, South Korea
来源
IEEE SYSTEMS JOURNAL | 2018年 / 12卷 / 04期
基金
新加坡国家研究基金会;
关键词
Data locality; Hadoop distributed file system (HDFS); MapReduce; task scheduling;
D O I
10.1109/JSYST.2017.2764481
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In general, Hadoop improves the task scheduling performance by determining data locality based on the location in which the input splits and MapTask are executed. However, if an input split consists of multiple data blocks that are distributed and stored in different nodes, this data location method fails to cope with the degradation in processing performance due to the increased frequency of data block copying. We propose a task scheduling algorithm that solves this issue by defining a method to classify data locality taking into account the location of all data blocks that comprise an input split, categorizing tasks based on the defined method, and sequentially assigning tasks according to a given priority. This study measures the performance of the proposed algorithm through a comparison of the total processing time, MapTask performance time, and data block copying frequency between the proposed algorithm and Hadoop's default task scheduling algorithm. The test results show that the proposed algorithm improved the total processing time by up to 25% and the data block copying frequency by up to 28%, when compared to the default algorithm.
引用
收藏
页码:3346 / 3357
页数:12
相关论文
共 50 条
  • [31] Taming Big Data SVM with Locality-Aware Scheduling
    Ye, Mao
    Wang, Jun
    Yin, Jiangling
    Han, Dezhi
    2016 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2016), 2016, : 37 - 44
  • [32] Improved Particle Optimization Algorithm Solving Hadoop Task Scheduling Problem
    Xu, Jun
    Tang, Yong
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND COGNITIVE INFORMATICS, 2015, : 11 - 14
  • [33] Network-Aware Locality Scheduling for Distributed Data Operators in Data Centers
    Cheng, Long
    Wang, Ying
    Liu, Qingzhi
    Epema, Dick H. J.
    Liu, Cheng
    Mao, Ying
    Murphy, John
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (06) : 1494 - 1510
  • [34] An efficient deadline constrained and data locality aware dynamic scheduling framework for multitenancy clouds
    Ru, Jia
    Yang, Yun
    Grundy, John
    Keung, Jacky
    Hao, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (05):
  • [35] EnLoc: Data Locality-aware Energy-efficient Scheduling Scheme for Cloud Data Centers
    Kaur, Kujeet
    Kumar, Neeraj
    Garg, Sahil
    Rodrigues, Joel J. P. C.
    2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
  • [36] Application and Storage-Aware Data Placement and Job Scheduling for Hadoop Clusters
    Li, Tao
    He, Shuibing
    Chen, Ping
    Yang, Siling
    Yin, Yanlong
    Xu, Cheng
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2020, 29 (16)
  • [37] A Genetic Algorithm for Energy Aware Task Scheduling in Heterogeneous Systems
    Lin, Man
    Ng, Sai Man
    PARALLEL PROCESSING LETTERS, 2005, 15 (04) : 439 - 449
  • [38] GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers
    Convolbo, Moise W.
    Chou, Jerry
    Hsu, Ching-Hsien
    Chung, Yeh Ching
    COMPUTING, 2018, 100 (01) : 21 - 46
  • [39] A constructive algorithm for memory-aware task assignment and scheduling
    Szymanek, R
    Kuchcinski, K
    PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON HARDWARE/SOFTWARE CODESIGN, 2001, : 147 - 152
  • [40] Storage-aware Task Scheduling for Performance Optimization of Big Data Workflows
    Ye, Qianwen
    Wu, Chase Q.
    Cao, Huiyan
    Rao, Nageswara S. V.
    Hou, Aiqin
    2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 1095 - 1102