Multi-Level Per Node Combiner (MLPNC) to Minimize MapReduce Job Latency on Virtualized Environment

被引：2

作者：

Jeyaraj, Rathinaraja ^{[1
]}

Ananthanarayana, V. S. ^{[1
]}

机构：

[1] Natl Inst Technol Karnataka, Dept Informat Technol, Surathkal, India

来源：

33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING | 2018年

关键词：

Combiner; MapReduce; Virtual Machines;

D O I：

10.1145/3167132.3167149

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Big data drove businesses and researches more data driven. Hadoop MapReduce is one of the cost-effective ways for processing huge amount of data and also offered as a service from cloud on cluster of Virtual Machines (VM). In Cloud Data Center (CDC), Hadoop VMs are co-located with other general purpose VMs across racks. Such a multi-tenancy leads to varying local network bandwidth availability for Hadoop VMs, which directly impacts MapReduce job latency. Because, shuffle phase in MapReduce execution sequence itself contributes 26%-70% of overall job latency due to large number of intermediate records. Therefore, Hadoop virtual cluster requires to ensure a maximum bandwidth to minimize job latency, but, it also increases the bandwidth usage cost. In this paper, we propose "Multi-Level Per Node Combiner" (MLPNC) that curtails the number of intermediate records in shuffle phase resulting to reduction in overall job latency. It also minimizes bandwidth usage cost as well. We evaluate MLPNC results on wordcount job against default combiner, and Per Node Combiner (PNC). We also discuss the results based on number of shuffled records, shuffle latency, average merge latency, average reduce latency, average reduce task start time, and overall job latency. Finally, we argue in favor of MLPNC as it achieves up to 33% reduction in number of intermediate records and up to 32% reduction in average job latency than PNC.

引用

页码：167 / 174

页数：8

共 12 条

[1] Costa P, 2012, P 9 USENIX C NETW SY, P1
[2] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[3] Leveraging the capabilities of service-oriented decision support systems: Putting analytics and big data in cloud
Demirkan, Haluk
Delen, Dursun
[J]. DECISION SUPPORT SYSTEMS, 2013, 55 (01) : 412 - 421
[4] Performance and energy efficiency of big data applications in cloud environments: A Hadoop case study
Feller, Eugen
Ramakrishnan, Lavanya
Morin, Christine
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 79-80 : 80 - 89
[5] Exploiting Efficient and Scalable Shuffle Transfers in Future Data Center Networks
Guo, Deke
Xie, Junjie
Zhou, Xiaolei
Zhu, Xiaomin
Wei, Wei
Luo, Xueshan
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (04) : 997 - 1009
[6] iShuffle: Improving Hadoop Performance with Shuffle-on-Write
Guo, Yanfei
Rao, Jia
Cheng, Dazhao
Zhou, Xiaobo
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1649 - 1662
[7] Aggregation on the fly: Reducing traffic for big data in the cloud
University of Aizu, Japan
不详
[J]. IEEE Network, 5 (17-23): : 17 - 23
[8] Liang F., 2016, BASHUFFLER MAXIMIZIN, P281
[9] Scaling MapReduce Applications across Hybrid Clouds to Meet Soft Deadlines
Mattess, Michael
Calheiros, Rodrigo N.
Buyya, Rajkumar
[J]. 2013 IEEE 27TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2013, : 629 - 636
[10] Smart Shuffling in MapReduce: a solution to Balance Network Traffic and Workloads
Shi, Wei
Wang, Yang
Corriveau, Jean-Pierre
Niu, Boqiang
Croft, William Lee
Peng, Mengfei
[J]. 2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 35 - 44

← 1 2 →