An Enhanced Data-Locality-Aware Task Scheduling Algorithm for Hadoop Applications

被引:13
作者
Choi, Dongjoo [1 ]
Jeon, Myunghoon [1 ]
Kim, Namgi [1 ]
Lee, Byoung-Dai [1 ]
机构
[1] Kyonggi Univ, Comp Sci Dept, Suwon 443760, South Korea
来源
IEEE SYSTEMS JOURNAL | 2018年 / 12卷 / 04期
基金
新加坡国家研究基金会;
关键词
Data locality; Hadoop distributed file system (HDFS); MapReduce; task scheduling;
D O I
10.1109/JSYST.2017.2764481
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In general, Hadoop improves the task scheduling performance by determining data locality based on the location in which the input splits and MapTask are executed. However, if an input split consists of multiple data blocks that are distributed and stored in different nodes, this data location method fails to cope with the degradation in processing performance due to the increased frequency of data block copying. We propose a task scheduling algorithm that solves this issue by defining a method to classify data locality taking into account the location of all data blocks that comprise an input split, categorizing tasks based on the defined method, and sequentially assigning tasks according to a given priority. This study measures the performance of the proposed algorithm through a comparison of the total processing time, MapTask performance time, and data block copying frequency between the proposed algorithm and Hadoop's default task scheduling algorithm. The test results show that the proposed algorithm improved the total processing time by up to 25% and the data block copying frequency by up to 28%, when compared to the default algorithm.
引用
收藏
页码:3346 / 3357
页数:12
相关论文
共 13 条
[1]   Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications [J].
Arslan, Engin ;
Shekhar, Mrigank ;
Kosar, Tevfik .
2014 5TH INTERNATIONAL WORKSHOP ON DATA-INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD), 2014, :17-24
[2]   A Study of Data Locality in YARN [J].
Elshater, Yehia ;
Martin, Patrick ;
Rope, Dan ;
McRoberts, Mike ;
Statchuk, Craig .
2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, :174-181
[3]  
Fatma N, 2016, PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2016), P222, DOI 10.1109/SYSMART.2016.7894524
[4]   Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce [J].
Kurazumi, Shiori ;
Tsumura, Tomoaki ;
Saito, Shoichi ;
Matsuo, Hiroshi .
2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND COMPUTING (ICNC 2012), 2012, :288-292
[5]   Performance Improvement of MapReduce Process by Promoting Deep Data Locality [J].
Lee, Sungchul ;
Joe, Ju-Yeon ;
Kim, Yoohwan .
PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, :292-301
[6]   vLocality: Revisiting Data Locality for MapReduce in Virtualized Clouds [J].
Ma, Xiaoqiang ;
Fan, Xiaoyi ;
Liu, Jiangchuan ;
Jiang, Hongbo ;
Peng, Kai .
IEEE NETWORK, 2017, 31 (01) :28-35
[7]  
Patel A. B., 2012, ENG NUICONE 2012 NIR, P1
[8]  
Shengli Gao, 2016, 2016 IEEE Trustcom/BigDataSE/ISPA, P1077, DOI 10.1109/TrustCom.2016.0178
[9]  
Xiaohong Zhang, 2011, Proceedings of the 2011 International Conference on Cloud and Service Computing (CSC 2011), P235, DOI 10.1109/CSC.2011.6138527
[10]  
Xiaohong Zhang, 2011, 2011 IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), P120, DOI 10.1109/ISPA.2011.14