MapReduce Data Skewness Handling: A Systematic Literature Review

被引:10
作者
Irandoost, Mohammad Amin [1 ]
Rahmani, Amir Masoud [1 ]
Setayeshi, Saeed [2 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Sci & Res Branch, Tehran, Iran
[2] Amirkabir Univ Technol, Dept Med Radiat Engn, Tehran, Iran
关键词
Data skewness; MapReduce; Load balancing; Big data; Systematic literature review; Survey; PROGRAMMING-MODEL; PERFORMANCE; MITIGATION; ALGORITHM;
D O I
10.1007/s10766-019-00627-0
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
One of the most successful techniques in large-scale data-intensive computations is MapReduce programming. MapReduce is based on a divide and conquer approach that uses commodity computers, also known as nodes, for parallel processing. The scalability and performance of this technique are more related to the type of data distribution in map and reduce tasks. Because of many reasons such as node failure, network failure, data skewness, etc. completion time of one task could be longer than other tasks, job completion time is determined by the slowest task. One of the most important reasons for requiring more time to finish one task compared to other tasks is the skewness of data. Despite the widespread use of MapReduce because of its high flexibility and tolerability of the error, with the presence of data skewness, it cannot fully utilize the nodes for parallel processing. The objectives of this study were to review related articles and classify them based on the type of problem addressed and to determine the advantages and disadvantages of them. Open issues were also defined to present guidelines for future research on this subject. In order to achieve the aforementioned objectives, some research questions were defined and answered. In this review, it was concluded that there are important parameters have not been considered in MapReduce data skewness handling approaches.
引用
收藏
页码:907 / 950
页数:44
相关论文
共 58 条
[1]  
Ahmad Faraz., 2012, ACM SIGARCH Computer Architecture News, V40, P61
[2]  
[Anonymous], 2014, P 6 INT WORKSHOP DAT
[3]  
[Anonymous], 2015, Hadoop-The Definitive Guide: Storage and Analysis at Internet Scale
[4]  
[Anonymous], 2012, MATH PROBLEMS ENG, DOI DOI 10.1016/J.PNEUR0BI0.2012.02.002
[5]  
[Anonymous], 2012, P 3 ACM S CLOUD COMP
[6]  
Arning A., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P164
[7]   Internet of Things applications: A systematic review [J].
Asghari, Parvaneh ;
Rahmani, Amir Masoud ;
Javadi, Hamid Haj Seyyed .
COMPUTER NETWORKS, 2019, 148 :241-261
[8]   Response surface methodological approach for optimizing Removal of Ni (II) from aqueous solution using Palm Shell Activated Carbon [J].
Baker, Inas F. ;
Ibrahim, Shaliza ;
Daud, W. M. A. W. .
PROCEEDINGS OF THE 2010 INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND TECHNOLOGY (ICEST 2010), 2010, :178-182
[9]   Lessons from innovation empirical studies in the manufacturing sector: A systematic review of the literature from 1993-2003 [J].
Becheikh, N ;
Landry, R ;
Amara, N .
TECHNOVATION, 2006, 26 (5-6) :644-664
[10]   MRSIM: Mitigating Reducer Skew In MapReduce [J].
Chen, Lei ;
Lu, Wei ;
Che, Xiaoping ;
Xing, Weiwei ;
Wang, Liqiang ;
Yang, Yong .
2017 31ST IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (IEEE WAINA 2017), 2017, :379-384