A comprehensive view of Hadoop research-A systematic literature review

被引:72
作者
Polato, Ivanilton [1 ,2 ]
Re, Reginaldo [2 ]
Goldman, Alfredo [1 ]
Kon, Fabio [1 ]
机构
[1] Univ Sao Paulo, Dept Comp Sci, Sao Paulo, Brazil
[2] Univ Tecnol Fed Parana, Dept Comp Sci, Toledo, Parana, Brazil
关键词
Systematic literature review; Apache Hadoop; MapReduce; HDFS; Survey; SCHEDULING ALGORITHM; DATA PLACEMENT; MAP-REDUCE; MAPREDUCE; PERFORMANCE; MANAGEMENT;
D O I
10.1016/j.jnca.2014.07.022
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Context: In recent years, the valuable knowledge that can be retrieved from petabyte scale datasets - known as Big Data - led to the development of solutions to process information based on parallel and distributed computing. Lately, Apache Hadoop has attracted strong attention due to its applicability to Big Data processing. Problem: The support of Hadoop by the research community has provided the development of new features to the framework. Recently, the number of publications in journals and conferences about Hadoop has increased consistently, which makes it difficult for researchers to comprehend the full body of research and areas that require further investigation. Solution: We conducted a systematic literature review to assess research contributions to Apache Hadoop. Our objective was to identify gaps, providing motivation for new research, and outline collaborations to Apache Hadoop and its ecosystem, classifying and quantifying the main topics addressed in the literature. Results: Our analysis led to some relevant conclusions: many interesting solutions developed in the studies were never incorporated into the framework; most publications lack sufficient formal documentation of the experiments conducted by authors, hindering their reproducibility; finally, the systematic review presented in this paper demonstrates that Hadoop has evolved into a solid platform to process large datasets, but we were able to spot promising areas and suggest topics for future research within the framework. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1 / 25
页数:25
相关论文
共 133 条
[1]  
Ahmad F, 2012, SIGARCH COMPUT ARCHI, V40, P61
[2]   MapReduce with communication overlap (MaRCO) [J].
Ahmad, Faraz ;
Lee, Seyong ;
Thottethodi, Mithuna ;
Vijaykumar, T. N. .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (05) :608-620
[3]  
Akoush S., 2013, Proceedings of the fifth USENIX conference on theory and practice of provenance (TaPP'13), P1
[4]  
[Anonymous], 2012, EuroSys, DOI [DOI 10.1145/2168836.2168843, 10.1145/2168836, DOI 10.1145/2168836]
[5]  
[Anonymous], P 2010 ACM SIGMOD IN, DOI [DOI 10.1145/1807167.1807184, 10.1145/1807167.1807184]
[6]  
[Anonymous], 2010, Proceedings of the International Symposium on High Performance Distributed Computing (HPDC '10), DOI [10.1145/1851476.1851594, DOI 10.1145/1851476.1851594]
[7]  
[Anonymous], P 2011 IEEE ACM 12 I
[8]  
[Anonymous], 2012, Hadoop: The definitive guide
[9]  
[Anonymous], 2009, Proceedings of the VLDB Endowment
[10]  
[Anonymous], 2008, 8 USENIX S OP SYST D