Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce

被引:16
作者
Song, Ge [1 ,2 ]
Rochas, Justine [1 ]
Huet, Fabrice [1 ]
Magoules, Frederic [2 ]
机构
[1] Univ Nice Sophia Antipolis, CNRS, I3S, UMR 7271, F-06900 Sophia Antipolis, France
[2] Ecole Cent Paris, Chatenay Malabry, France
来源
23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015) | 2015年
关键词
kNN Join; Data Partition; Hadoop; MapReduce; SEARCH;
D O I
10.1109/PDP.2015.79
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for large scale data processing. Also, it can easily be executed in a distributed environment. Although these works provide different solutions to the same problem, each one has particular constraints and properties. There is no readily available comparison to help users choose the one most appropriate for their needs. This is the problem we address in this work. Firstly, we show that all kNN implementations go through a common workflow, which we use as a basis for classification. Secondly, we describe precisely the different techniques published so far. And lastly, we provide a set of objective criteria that can be used to make informed decisions.
引用
收藏
页码:279 / 287
页数:9
相关论文
共 50 条
  • [21] Processing generalized k-nearest neighbor queries on a wireless broadcast stream
    Jung, HaRim
    Chung, Yon Dohn
    Liu, Ling
    INFORMATION SCIENCES, 2012, 188 : 64 - 79
  • [22] Scalable Distributed Processing of K Nearest Neighbor Queries over Moving Objects
    Yu, Ziqiang
    Liu, Yang
    Yu, Xiaohui
    Pu, Ken Q.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) : 1383 - 1396
  • [23] Algorithms for processing the group K nearest-neighbor query on distributed frameworks
    Panagiotis Moutafis
    Francisco García-García
    George Mavrommatis
    Michael Vassilakopoulos
    Antonio Corral
    Luis Iribarne
    Distributed and Parallel Databases, 2021, 39 : 733 - 784
  • [24] Algorithm for processing k-nearest join based on R-tree in MapReduce
    Liu, Yi
    Jing, Ning
    Chen, Luo
    Xiong, Wei
    Ruan Jian Xue Bao/Journal of Software, 2013, 24 (08): : 1836 - 1851
  • [25] Approximate direct and reverse nearest neighbor queries, and the k-nearest neighbor graph
    Figueroa, Karina
    Paredes, Rodrigo
    SISAP 2009: 2009 SECOND INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2009, : 91 - +
  • [26] Analysis of Massive Industrial Data using MapReduce Framework for Parallel Processing
    Aly, Mohab
    Yacout, Soumaya
    Shaban, Yasser
    2017 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2017,
  • [27] MR-SNN: Design of Parallel Shared Nearest Neighbor Clustering Algorithm Using MapReduce
    Wang, Sujing
    Eick, Christoph F.
    2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 317 - 320
  • [28] Strategic and suave processing for performing similarity joins using MapReduce
    Mahalakshmi Lakshminarayanan
    William F. Acosta
    Robert C. Green
    Vijay Devabhaktuni
    The Journal of Supercomputing, 2014, 69 : 930 - 954
  • [29] Strategic and suave processing for performing similarity joins using MapReduce
    Lakshminarayanan, Mahalakshmi
    Acosta, William F.
    Green, Robert C., II
    Devabhaktuni, Vijay
    JOURNAL OF SUPERCOMPUTING, 2014, 69 (02) : 930 - 954
  • [30] Prominence of MapReduce in BIG DATA Processing
    Pandey, Shweta
    Tokekar, Vrinda
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 555 - 560