Historical data based approach to mitigate stragglers from the Reduce phase of MapReduce in a heterogeneous Hadoop cluster

被引:9
作者
Bawankule, Kamalakant Laxman [1 ]
Dewang, Rupesh Kumar [1 ]
Singh, Anil Kumar [1 ]
机构
[1] Motilal Nehru Natl Inst Technol Allahabad, Dept Comp Sci & Engn, Pryagraj, Uttar Pradesh, India
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2022年 / 25卷 / 05期
关键词
Data locality based scheduler (DLBS); Hadoop; Historical data based Reduce tasks scheduling (HDRTS); Heterogeneous environment; MapReduce; Node average response time (NART); Total average response time (TART); Heterogeneous cluster; PERFORMANCE;
D O I
10.1007/s10586-021-03530-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop MapReduce processes data on the cluster of commodity hardware (node) in two phases using Map and Reduce tasks. Yet another resource negotiator (YARN), a dynamic resource manager, allocates resources for Map tasks by preserving the data locality. In contrast, it allocates resources to schedule the Reduce tasks on any node in the cluster. The policy's performance is better in a homogeneous environment, where the nodes' computing capabilities are identical. However, its performance degrades in a heterogeneous environment when it allocates the containers for scheduling the Reduce tasks on any node that slowdowns the Reduce tasks execution and leads to computational skew. To mitigate the computational skew from the Reduce phase of MapReduce, we proposed the Historical data based Reduce tasks scheduling (HDRTS) technique. The technique has two algorithms: The first algorithm finds node average response time (NART) of each node by interpreting the job history information. The second algorithm allocates the resource on the faster processing node (FPN) to schedule the Reduce tasks. To evaluate the policy's performance, we have used a very popular benchmark, i.e., the HiBench benchmark suite. Finally, compared with Hadoop's default policy and several other policies, the proposed HDRTS policy reduces the Reduce tasks execution time for reduce-input-heavy jobs by nearly 25% to 37% significantly. Finally, it mitigates the computational skew and the stragglers from Reduce phase of MapReduce in the heterogeneous environments.
引用
收藏
页码:3193 / 3211
页数:19
相关论文
共 25 条
[1]  
[Anonymous], 2008, OSDI
[2]  
Arasanal R. M., 2013, PROF INT C DISTR COM, P115
[3]   Performance Analysis of Hadoop YARN Job Schedulers in a Multi-Tenant Environment on HiBench Benchmark Suite [J].
Bawankule, Kamalakant Laxman ;
Dewang, Rupesh Kumar ;
Singh, Anil Kumar .
INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2021, 12 (03) :64-82
[4]   Historical data based approach for straggler avoidance in a heterogeneous Hadoop cluster [J].
Bawankule, Kamalakant Laxman ;
Dewang, Rupesh Kumar ;
Singh, Anil Kumar .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (10) :9573-9589
[5]   Load Balancing Approach for a MapReduce Job Running on a Heterogeneous Hadoop Cluster [J].
Bawankule, Kamalakant Laxman ;
Dewang, Rupesh Kumar ;
Singh, Anil Kumar .
DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, ICDCIT 2021, 2021, 12582 :289-298
[6]  
Bo Wang, 2015, 2015 IEEE Conference on Computer Communications (INFOCOM). Proceedings, P1328, DOI 10.1109/INFOCOM.2015.7218509
[7]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[8]   Chisel: A Resource Savvy Approach for Handling Skew in MapReduce Applications [J].
Dhawalia, Prateek ;
Kailasam, Sriram ;
Janakiram, Dharanipragada .
2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, :652-660
[9]   A classification of hadoop job schedulers based on performance optimization approaches [J].
Ghazali, Rana ;
Adabi, Sahar ;
Down, Douglas G. ;
Movaghar, Ali .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (04) :3381-3403
[10]  
Huang SS, 2010, I C DATA ENGIN WORKS, P41, DOI 10.1109/ICDEW.2010.5452747