The Family of MapReduce and Large-Scale Data Processing Systems

被引:104
作者
Sakr, Sherif [1 ,2 ]
Liu, Anna [1 ,2 ]
Fayoumi, Ayman G. [3 ]
机构
[1] NICTA, Dept Comp Sci & Engn, Sydney, NSW, Australia
[2] Univ New S Wales, Sydney, NSW, Australia
[3] King Abdulaziz Univ, Jeddah 21413, Saudi Arabia
关键词
Design; Algorithms; Performance; MapReduce; big data; large-scale data processing; COST-BASED OPTIMIZATION; MAP-REDUCE; SIMILARITY JOINS; DATA PLACEMENT; FRAMEWORK; PERFORMANCE; QUERIES; SPARQL;
D O I
10.1145/2522968.2522979
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large-scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling, and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large-scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large-scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.
引用
收藏
页数:44
相关论文
共 131 条
[1]   SW-Store: a vertically partitioned DBMS for Semantic Web data management [J].
Abadi, Daniel J. ;
Marcus, Adam ;
Madden, Samuel R. ;
Hollenbach, Kate .
VLDB JOURNAL, 2009, 18 (02) :385-406
[2]  
ABOUZEID A., 2010, P 36 ACM SIGMOD INT
[3]   Fuzzy Joins Using MapReduce [J].
Afrati, Foto N. ;
Das Sarma, Anish ;
Menestrina, David ;
Parameswaran, Aditya ;
Ullman, Jeffrey D. .
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, :498-509
[4]   Optimizing Multiway Joins in a Map-Reduce Environment [J].
Afrati, Foto N. ;
Ullman, Jeffrey D. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (09) :1282-1298
[5]   Massively Parallel Data Analysis with PACTs on Nephele [J].
Alexandrov, Alexander ;
Heimel, Max ;
Markl, Volker ;
Battre, Dominic ;
Hueske, Fabian ;
Nijkamp, Erik ;
Ewen, Stephan ;
Kao, Odej ;
Warneke, Daniel .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (02) :1625-1628
[6]  
Alvaro P, 2010, EUROSYS'10: PROCEEDINGS OF THE EUROSYS 2010 CONFERENCE, P223
[7]  
[Anonymous], 2005, Scientific Programming
[8]  
[Anonymous], 2010, EDBT, DOI [DOI 10.1145/1739041.1739056, 10.1145/1739041.1739056]
[9]  
[Anonymous], 2008, W3C RECOMMENDATION
[10]  
[Anonymous], 2009, CLOUDS BERKELEY VIEW