Parallel Algorithms for Constructing Range and Nearest-Neighbor Searching Data Structures

被引:14
|
作者
Agarwal, Pankaj K. [1 ]
Fox, Kyle [1 ]
Munagala, Kamesh [1 ]
Nath, Abhinandan [1 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
基金
美国国家科学基金会;
关键词
MODEL; COMPLEXITY; MAPREDUCE;
D O I
10.1145/2902251.2902303
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the massive amounts of data available today, it is common to store and process data using multiple machines. Parallel programming platforms such as MapReduce and its variants are popular frameworks for handling such large data. We present the first provably efficient algorithms to compute, store, and query data structures for range queries and approximate nearest neighbor queries in a popular parallel computing abstraction that captures the salient features of MapReduce and other massively parallel communication (MPC) models. In particular, we describe algorithms for kd-trees, range trees, and BBD-trees that only require O(1) rounds of communication for both preprocessing and querying while staying competitive in terms of running time and workload to their classical counterparts. Our algorithms are randomized, but they can be made deterministic at some increase in their running time and workload while keeping the number of rounds of communication to be constant.
引用
收藏
页码:429 / 440
页数:12
相关论文
共 50 条
  • [41] Two fast nearest neighbor searching algorithms for vector quantization
    Baek, S
    Sung, KM
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2001, E84A (10): : 2569 - 2575
  • [42] Two fast nearest neighbor searching algorithms for vector quantization
    Baek, S.J.
    Sung, K.-M.
    2001, Institute of Electronics, Information and Communication, Engineers, IEICE (E84-A)
  • [43] Algorithms for processing the group K nearest-neighbor query on distributed frameworks
    Panagiotis Moutafis
    Francisco García-García
    George Mavrommatis
    Michael Vassilakopoulos
    Antonio Corral
    Luis Iribarne
    Distributed and Parallel Databases, 2021, 39 : 733 - 784
  • [44] A model for synchronization of parallel control processes in nearest-neighbor microcontroller networks
    Zotov, I.V.
    Avtomatika i Vychislitel'naya Tekhnika, 2001, (03): : 44 - 56
  • [45] Scalable Parallel Algorithms for Shared Nearest Neighbor Clustering
    Kumari, Sonal
    Maurya, Saurabh
    Goyal, Poonam
    Balasubramaniam, Sundar S.
    Goyal, Navneet
    PROCEEDINGS OF 2016 IEEE 23RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2016, : 72 - 81
  • [46] Multi-Threaded Hierarchical Clustering by Parallel Nearest-Neighbor Chaining
    Jeon, Yongkweon
    Yoon, Sungroh
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (09) : 2534 - 2548
  • [47] Effectiveness of nearest-neighbor data adjustment in a clonal test of redwood
    Anekonda, TS
    Libby, WJ
    SILVAE GENETICA, 1996, 45 (01) : 46 - 51
  • [48] ESTIMATING INTERACTIONS IN BINARY LATTICE DATA WITH NEAREST-NEIGHBOR PROPERTY
    JANZURA, M
    KYBERNETIKA, 1987, 23 (02) : 136 - 142
  • [49] NearTree, a data structure and a software toolkit for the nearest-neighbor problem
    Andrews, Lawrence C.
    Bernstein, Herbert J.
    JOURNAL OF APPLIED CRYSTALLOGRAPHY, 2016, 49 : 756 - 761
  • [50] On nearest-neighbor Gaussian process models for massive spatial data
    Datta, Abhirup
    Banerjee, Sudipto
    Finley, Andrew O.
    Gelfand, Alan E.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2016, 8 (05): : 162 - 171