ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms

被引:12
作者
Ding, Linlin [1 ]
Wang, Guoren [1 ]
Xin, Junchang [1 ]
Wang, Xiaoyang [1 ]
Huang, Shan [1 ]
Zhang, Rui [1 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic 3010, Australia
基金
中国国家自然科学基金;
关键词
MapReduce; Hadoop; Communication mechanism;
D O I
10.1016/j.datak.2013.04.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a parallel programming framework, MapReduce can process scalable and parallel applications with large scale datasets. The executions of Mappers and Reducers are independent of each other. There is no communication among Mappers, neither among Reducers. When the amount of final results is much smaller than the original data, it is a waste of time processing the unpromising intermediate data. We observe that this waste can be significantly reduced by simple communication mechanisms to enhance the performance of MapReduce. In this paper, we propose ComMapReduce, an efficient framework that extends and improves MapReduce for big data applications in the cloud. ComMapReduce can effectively obtain certain shared information with efficient lightweight communication mechanisms. Three basic communication strategies, Lazy, Eager and Hybrid, and two optimization communication strategies, Prepositive and Postpositive, are proposed to obtain the shared information and effectively process big data applications. We also illustrate the implementations of three typical applications with large scale datasets on ComMapReduce. Our extensive experiments demonstrate that ComMapReduce outperforms MapReduce in all metrics without affecting the existing characteristics of MapReduce. Crown Copyright (C) 2013 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:224 / 247
页数:24
相关论文
共 35 条
  • [1] Afrati F.N., 2011, EDBT, P1, DOI DOI 10.1145/1951365.1951367
  • [2] [Anonymous], 2010, EDBT, DOI [DOI 10.1145/1739041.1739056, 10.1145/1739041.1739056]
  • [3] [Anonymous], 2010, P 13 INT C EXT DAT T
  • [4] [Anonymous], 2009, Proceedings of the VLDB Endowment
  • [5] [Anonymous], 2010, P ACM SIGMOD INT C M, DOI DOI 10.1145/1807167.1807273
  • [6] [Anonymous], 2003, P 19 ACM S OP SYST P, DOI [10.1145/1165389.945450, DOI 10.1145/1165389.945450]
  • [7] SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS
    BLOOM, BH
    [J]. COMMUNICATIONS OF THE ACM, 1970, 13 (07) : 422 - &
  • [8] Bu YY, 2010, PROC VLDB ENDOW, V3, P285
  • [9] Burrows M, 2006, USENIX ASSOCIATION 7TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P335
  • [10] Carstoiu D., 2010, IJACT, V2, P42