Load Balancing Approach for a MapReduce Job Running on a Heterogeneous Hadoop Cluster

被引：8

作者：

Bawankule, Kamalakant Laxman ^{[1
]}

Dewang, Rupesh Kumar ^{[1
]}

Singh, Anil Kumar ^{[1
]}

机构：

[1] Motilal Nehru Natl Inst Technol Allahabad, Dept Comp Sci & Engn, Prayagraj, India

来源：

DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, ICDCIT 2021 | 2021年 / 12582卷

关键词：

Heterogeneous cluster; Hadoop; Load balancing; MapReduce; Reduce tasks; DATA PLACEMENT;

D O I：

10.1007/978-3-030-65621-8_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Hadoop MapReduce has become the de-facto standard in today's Big data world to process the more prominent data sets on a distributed cluster of commodity hardware. Today computing nodes in a commodity cluster do not have the same hardware configuration, which leads to heterogeneity. Heterogeneity has become common in the industry, research institutes, and academics. Our study shows that the current rules for calculating the required number of Reduce tasks (Reducers) for a MapReduce job are fallacious, leading to significant computing resources' overutilization. It also degrades MapReduce job performance running on a heterogeneous Hadoop cluster. However, there is no definite answer to the question: What is the optimal number of Reduce tasks required for a MapReduce job to get Hadoop's most accomplished performance in a heterogeneous cluster? We have proposed a new rule that decides the required number of reduce tasks for a MapReduce job running on a heterogeneous Hadoop cluster accurately. The proposed rule balances the load among the heterogeneous nodes in the Reduce phase of MapReduce. It also minimizes computing resources' overutilization and improves the MapReduce job execution time by an average of 18% and 28% for TeraSort and PageRank applications running on a heterogeneous Hadoop cluster.

引用

页码：289 / 298

页数：10

共 18 条

[1]

Ahmad F, 2012, ASPLOS XVII: SEVENTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, P61

[2] MRA plus plus : Scheduling and data placement on MapReduce for heterogeneous environments [J].

Anjos, Julio C. S. ;

Carrera, Ivan ;

Kolberg, Wagner ;

Tibola, Andre Luis ;

Arantes, Luciana B. ;

Geyer, Claudio R. .

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2015, 42 :22-35

[3]

[Anonymous], 2012, Hadoop: The Definitive Guide

[4]

[Anonymous], 2003, SOSP

[5] Mapreduce: Simplified data processing on large clusters [J].

Dean, Jeffrey ;

Ghemawat, Sanjay .

COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113

[6]

Gandhi Rohan, 2013, Proceedings of USENIX ATC '13: 2013 USENIX Annual Technical Conference. ATC '13, P61

[7] Dynamic Workload Balancing for Hadoop MapReduce [J].

Hou, Xiaofei ;

Kumar, Ashwin T. K. ;

Thomas, Johnson P. ;

Varadharaj, Vijay .

2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, :56-62

[8]

Huang SS, 2010, I C DATA ENGIN WORKS, P41, DOI 10.1109/ICDEW.2010.5452747

[9]

Kwon YongChul, 2012, P ACM SIGMOD

[10] A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments [J].

Lee, Chia-Wei ;

Hsieh, Kuang-Yu ;

Hsieh, Sun-Yuan ;

Hsiao, Hung-Chang .

BIG DATA RESEARCH, 2014, 1 :14-22

← 1 2 →