Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce

被引:16
作者
Song, Ge [1 ,2 ]
Rochas, Justine [1 ]
Huet, Fabrice [1 ]
Magoules, Frederic [2 ]
机构
[1] Univ Nice Sophia Antipolis, CNRS, I3S, UMR 7271, F-06900 Sophia Antipolis, France
[2] Ecole Cent Paris, Chatenay Malabry, France
来源
23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015) | 2015年
关键词
kNN Join; Data Partition; Hadoop; MapReduce; SEARCH;
D O I
10.1109/PDP.2015.79
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for large scale data processing. Also, it can easily be executed in a distributed environment. Although these works provide different solutions to the same problem, each one has particular constraints and properties. There is no readily available comparison to help users choose the one most appropriate for their needs. This is the problem we address in this work. Firstly, we show that all kNN implementations go through a common workflow, which we use as a basis for classification. Secondly, we describe precisely the different techniques published so far. And lastly, we provide a set of objective criteria that can be used to make informed decisions.
引用
收藏
页码:279 / 287
页数:9
相关论文
共 50 条
[31]   Constrained k-nearest neighbor query processing over moving object trajectories [J].
Gao, Yunjun ;
Chen, Gencai ;
Li, Qing ;
Li, Chun ;
Chen, Chun .
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2008, 4947 :635-+
[32]   Evidential instance selection for K-nearest neighbor classification of big data [J].
Gong, Chaoyu ;
Su, Zhi-gang ;
Wang, Pei-hong ;
Wang, Qian ;
You, Yang .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 138 :123-144
[33]   A MapReduce Based k-NN Joins Probabilistic Classifier [J].
Chatzigeorgakidis, Georgios ;
Karagiorgou, Sophia ;
Athanasiou, Spiros ;
Skiadopoulos, Spiros .
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, :952-957
[34]   A Micropartitioning Technique for Massive Data Analysis Using MapReduce [J].
Mohanapriya, S. ;
Natesan, P. .
2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
[35]   Analysis of the k-nearest neighbor classification [J].
Li, Jing ;
Cheng, Ming .
INFORMATION SCIENCE AND MANAGEMENT ENGINEERING, VOLS 1-3, 2014, 46 :1911-1917
[36]   Massive Image Data Management using HBase and MapReduce [J].
Liu, Yuehu ;
Chen, Bin ;
He, Wenxi ;
Fang, Yu .
2013 21ST INTERNATIONAL CONFERENCE ON GEOINFORMATICS (GEOINFORMATICS), 2013,
[37]   Coarse to fine K nearest neighbor classifier [J].
Xu, Yong ;
Zhu, Qi ;
Fan, Zizhu ;
Qiu, Minna ;
Chen, Yan ;
Liu, Hong .
PATTERN RECOGNITION LETTERS, 2013, 34 (09) :980-986
[38]   Considering Data Skew in Multi-way Joins for MapReduce [J].
Wu, Lei ;
Zhang, Changchun ;
Meng, Haiyan ;
Li, Jing .
2013 8TH CHINAGRID ANNUAL CONFERENCE (CHINAGRID), 2013, :69-73
[39]   Noisy data elimination using mutual k-nearest neighbor for classification mining [J].
Liu, Huawen ;
Zhang, Shichao .
JOURNAL OF SYSTEMS AND SOFTWARE, 2012, 85 (05) :1067-1074
[40]   Comparative Analysis of Nearest Neighbor Query Processing Techniques [J].
Mahapatra, Rajendra Prasad ;
Chakraborty, Partha Sarathi .
3RD INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTING 2015 (ICRTC-2015), 2015, 57 :1289-1298