Multi-Level Per Node Combiner (MLPNC) to Minimize MapReduce Job Latency on Virtualized Environment

被引:2
作者
Jeyaraj, Rathinaraja [1 ]
Ananthanarayana, V. S. [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Informat Technol, Surathkal, India
来源
33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING | 2018年
关键词
Combiner; MapReduce; Virtual Machines;
D O I
10.1145/3167132.3167149
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big data drove businesses and researches more data driven. Hadoop MapReduce is one of the cost-effective ways for processing huge amount of data and also offered as a service from cloud on cluster of Virtual Machines (VM). In Cloud Data Center (CDC), Hadoop VMs are co-located with other general purpose VMs across racks. Such a multi-tenancy leads to varying local network bandwidth availability for Hadoop VMs, which directly impacts MapReduce job latency. Because, shuffle phase in MapReduce execution sequence itself contributes 26%-70% of overall job latency due to large number of intermediate records. Therefore, Hadoop virtual cluster requires to ensure a maximum bandwidth to minimize job latency, but, it also increases the bandwidth usage cost. In this paper, we propose "Multi-Level Per Node Combiner" (MLPNC) that curtails the number of intermediate records in shuffle phase resulting to reduction in overall job latency. It also minimizes bandwidth usage cost as well. We evaluate MLPNC results on wordcount job against default combiner, and Per Node Combiner (PNC). We also discuss the results based on number of shuffled records, shuffle latency, average merge latency, average reduce latency, average reduce task start time, and overall job latency. Finally, we argue in favor of MLPNC as it achieves up to 33% reduction in number of intermediate records and up to 32% reduction in average job latency than PNC.
引用
收藏
页码:167 / 174
页数:8
相关论文
共 12 条
  • [1] Costa P, 2012, P 9 USENIX C NETW SY, P1
  • [2] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [3] Leveraging the capabilities of service-oriented decision support systems: Putting analytics and big data in cloud
    Demirkan, Haluk
    Delen, Dursun
    [J]. DECISION SUPPORT SYSTEMS, 2013, 55 (01) : 412 - 421
  • [4] Performance and energy efficiency of big data applications in cloud environments: A Hadoop case study
    Feller, Eugen
    Ramakrishnan, Lavanya
    Morin, Christine
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 79-80 : 80 - 89
  • [5] Exploiting Efficient and Scalable Shuffle Transfers in Future Data Center Networks
    Guo, Deke
    Xie, Junjie
    Zhou, Xiaolei
    Zhu, Xiaomin
    Wei, Wei
    Luo, Xueshan
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (04) : 997 - 1009
  • [6] iShuffle: Improving Hadoop Performance with Shuffle-on-Write
    Guo, Yanfei
    Rao, Jia
    Cheng, Dazhao
    Zhou, Xiaobo
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1649 - 1662
  • [7] Aggregation on the fly: Reducing traffic for big data in the cloud
    University of Aizu, Japan
    不详
    [J]. IEEE Network, 5 (17-23): : 17 - 23
  • [8] Liang F., 2016, BASHUFFLER MAXIMIZIN, P281
  • [9] Scaling MapReduce Applications across Hybrid Clouds to Meet Soft Deadlines
    Mattess, Michael
    Calheiros, Rodrigo N.
    Buyya, Rajkumar
    [J]. 2013 IEEE 27TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2013, : 629 - 636
  • [10] Smart Shuffling in MapReduce: a solution to Balance Network Traffic and Workloads
    Shi, Wei
    Wang, Yang
    Corriveau, Jean-Pierre
    Niu, Boqiang
    Croft, William Lee
    Peng, Mengfei
    [J]. 2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 35 - 44