Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce

被引:16
作者
Song, Ge [1 ,2 ]
Rochas, Justine [1 ]
Huet, Fabrice [1 ]
Magoules, Frederic [2 ]
机构
[1] Univ Nice Sophia Antipolis, CNRS, I3S, UMR 7271, F-06900 Sophia Antipolis, France
[2] Ecole Cent Paris, Chatenay Malabry, France
来源
23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015) | 2015年
关键词
kNN Join; Data Partition; Hadoop; MapReduce; SEARCH;
D O I
10.1109/PDP.2015.79
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for large scale data processing. Also, it can easily be executed in a distributed environment. Although these works provide different solutions to the same problem, each one has particular constraints and properties. There is no readily available comparison to help users choose the one most appropriate for their needs. This is the problem we address in this work. Firstly, we show that all kNN implementations go through a common workflow, which we use as a basis for classification. Secondly, we describe precisely the different techniques published so far. And lastly, we provide a set of objective criteria that can be used to make informed decisions.
引用
收藏
页码:279 / 287
页数:9
相关论文
共 50 条
[41]   Massive data MapReduce fingerprint discriminant algorithm Based on Hadoop [J].
Lu, Wei ;
Huang, Jun ;
Hong, Lin .
INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 :2655-+
[42]   Aggregate k Nearest Neighbor Queries in Metric Spaces [J].
Ding, Xin ;
Zhang, Yuanliang ;
Chen, Lu ;
Yang, Keyu ;
Gao, Yunjun .
WEB AND BIG DATA (APWEB-WAIM 2018), PT II, 2018, 10988 :317-333
[43]   Parallel labeling of massive XML data with MapReduce [J].
Choi, Hyebong ;
Lee, Kyong-Ha ;
Lee, Yoon-Joon .
JOURNAL OF SUPERCOMPUTING, 2014, 67 (02) :408-437
[44]   Parallel labeling of massive XML data with MapReduce [J].
Hyebong Choi ;
Kyong-Ha Lee ;
Yoon-Joon Lee .
The Journal of Supercomputing, 2014, 67 :408-437
[45]   Large Scale, Complex Processing of Health Data with MapReduce [J].
Nguyen, Khanh Luan P. ;
Ashish, Naveen .
JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2014, 13 (01)
[46]   A Comprehensive Survey of MapReduce Models for Processing Big Data [J].
Abdalla, Hemn Barzan ;
Kumar, Yulia ;
Zhao, Yue ;
Tosi, Davide .
BIG DATA AND COGNITIVE COMPUTING, 2025, 9 (04)
[47]   Research of a MapReduce Communication Data Stream Processing Model [J].
Yang, Wenchuan ;
Jia, Bei .
PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 :28-31
[48]   Comparing MapReduce-Basedk-NN Similarity Joins on Hadoop for High-Dimensional Data [J].
Cech, Premysl ;
Marousek, Jakub ;
Lokoc, Jakub ;
Silva, Yasin N. ;
Starks, Jeremy .
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017, 2017, 10604 :63-75
[49]   Scientific data processing using MapReduce in cloud environments [J].
Kong, Xiangsheng .
Information Technology Journal, 2013, 12 (23) :7869-7873
[50]   MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification [J].
Wei Xu ;
Vinh Truong Hoang .
Mobile Networks and Applications, 2021, 26 :191-199