HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduce

被引:4
作者
Wang, Xite [1 ]
Wang, Chaojin [1 ]
Bai, Mei [1 ]
Ma, Qian [1 ]
Li, Guanyu [1 ]
机构
[1] Dalian Maritime Univ, Informat Sci & Technol Coll, Dalian, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
MapReduce; Scheduling; Heterogeneous; Throughput; CLUSTERS; LOCALITY; SKEW;
D O I
10.1007/s10619-021-07375-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As one of the most popular parallel data processing models, data analysis system MapReduce has been widely used in many fields. Task scheduling is the core module in MapReduce system, and the quality of the scheduling algorithm directly affects the processing capacity of the system. Since new nodes need to be continuously added in the cluster to improve the processing capacity of the cluster, objectively, the heterogeneity of the cluster is caused. Heterogeneous environment is common in practical application scenarios, but there has been little research on task scheduling in heterogeneous environment. For this reason, this paper presents an in-depth study of task scheduling in heterogeneous environment and proposes a new task scheduling algorithm HTD. First, we give a formal definition of the throughput-driven task scheduling problem in a heterogeneous environment. Second, we design the scheduling algorithm HTD, which quickly obtains the completion sequence of a jobs set and optimizes the task scheduling details in heterogeneous environment. Finally, a series of experiments show the efficiency and effectiveness of the algorithm.
引用
收藏
页码:135 / 163
页数:29
相关论文
共 29 条
[1]  
Ahmad F, 2012, ASPLOS XVII: SEVENTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, P61
[2]  
[Anonymous], 2008, OSDI
[3]   Effectively and Efficiently Designing and Querying Parallel Relational Data Warehouses on Heterogeneous Database Clusters: The F&A Approach [J].
Bellatreche, Ladjel ;
Cuzzocrea, Alfredo ;
Benkrid, Soumia .
JOURNAL OF DATABASE MANAGEMENT, 2012, 23 (04) :17-51
[4]   Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy [J].
Benifa, J. V. Bibal ;
Dejey .
WIRELESS PERSONAL COMMUNICATIONS, 2017, 95 (03) :2709-2733
[5]  
Bo Wang, 2015, 2015 IEEE Conference on Computer Communications (INFOCOM). Proceedings, P1328, DOI 10.1109/INFOCOM.2015.7218509
[6]   Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing [J].
Camacho-Rodriguez, Jesus ;
Chauhan, Ashutosh ;
Gates, Alan ;
Koifman, Eugene ;
O'Malley, Owen ;
Garg, Vineet ;
Haindrich, Zoltan ;
Shelukhin, Sergey ;
Jayachandran, Prasanth ;
Seth, Siddharth ;
Jaiswal, Deepak ;
Bouguerra, Slim ;
Bangarwa, Nishant ;
Hariappan, Sankar ;
Agarwal, Anishek ;
Dere, Jason ;
Dai, Daniel ;
Nair, Thejas ;
Dembla, Nita ;
Vijayaraghavan, Gopal ;
Hagleitner, Guenther .
SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, :1773-1786
[7]   MapReduce Scheduling for Deadline-Constrained Jobs in Heterogeneous Cloud Computing Systems [J].
Chen, Chien-Hung ;
Lin, Jenn-Wei ;
Kuo, Sy-Yen .
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (01) :127-140
[8]   Energy- and locality-efficient multi-job scheduling based on MapReduce for heterogeneous datacenter [J].
Chen, Lei ;
Liu, Zhao-Hua .
SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2019, 13 (04) :297-308
[9]   Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning [J].
Cheng, Dazhao ;
Rao, Jia ;
Guo, Yanfei ;
Jiang, Changjun ;
Zhou, Xiaobo .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (03) :774-786
[10]   Distribution-Based Query Scheduling [J].
Chi, Yun ;
Hacigumus, Hakan ;
Hsiung, Wang -Pin ;
Naughton, Jeffrey F. .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (09) :673-684