A novel algorithm for scalable k-nearest neighbour graph construction

被引:5
|
作者
Park, Youngki [1 ]
Hwang, Heasoo [2 ]
Lee, Sang-goo [1 ]
机构
[1] Seoul Natl Univ, Seoul 151, South Korea
[2] Univ Seoul, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
collaborative filtering; k-nearest neighbour search; k-nearest neighbour graph construction;
D O I
10.1177/0165551515594728
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Finding the k-nearest neighbours of every node in a dataset is one of the most important data operations with wide application in various areas such as recommendation and information retrieval. However, a major challenge is that the execution time of existing approaches grows rapidly as the number of nodes or dimensions increases. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate k-nearest neighbour graph. It selects a fixed number of nodes as candidates for every node by filtering out node pairs that do not have any matching dimensions with large values. Greedy filtering achieves consistent approximation accuracy across nodes in linear execution time. We also present a faster version of greedy filtering that uses inverted indices on the node prefixes. Through theoretical analysis, we show that greedy filtering is effective for datasets whose features have Zipfian distribution, a characteristic observed in majority of large datasets. We also conduct extensive comparative experiments against (a) three state-of-the-art algorithms, and (b) three algorithms in related research domains. Our experimental results show that greedy filtering consistently outperforms other algorithms in various types of high-dimensional datasets.
引用
收藏
页码:274 / 288
页数:15
相关论文
共 50 条
  • [1] Greedy Filtering: A Scalable Algorithm for K-Nearest Neighbor Graph Construction
    Park, Youngki
    Park, Sungchan
    Lee, Sang-goo
    Jung, Woosung
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT I, 2014, 8421 : 327 - 341
  • [2] Scalable K-Nearest Neighbor Graph Construction Based on Greedy Filtering
    Park, Youngki
    Park, Sungchan
    Lee, Sang-goo
    Jung, Woosung
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 227 - 228
  • [3] Outlier detection using k-nearest neighbour graph
    Hautamäki, V
    Kärkkäinen, I
    Fränti, P
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 430 - 433
  • [4] A fast fuzzy K-nearest neighbour algorithm for pattern classification
    Boutalis, Yiannis S.
    Andreadis, Ioannis T.
    Tambakis, George D.
    Intelligent Data Analysis, 2000, 4 (3-4) : 275 - 288
  • [5] Prediction of MUET Results Based on K-Nearest Neighbour Algorithm
    Sabri N.M.
    Hamrizan S.F.A.
    Annals of Emerging Technologies in Computing, 2023, 7 (05) : 50 - 59
  • [6] Adaptive K-nearest neighbour algorithm for WiFi fingerprint positioning
    Oh, Jongtaek
    Kim, Jisu
    ICT EXPRESS, 2018, 4 (02): : 91 - 94
  • [7] Arabic Text Classification Using K-Nearest Neighbour Algorithm
    Alhutaish, Roiss
    Omar, Nazlia
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2015, 12 (02) : 190 - 195
  • [8] Balanced k-nearest neighbour imputation
    Hasler, Caren
    Tille, Yves
    STATISTICS, 2016, 50 (06) : 1310 - 1331
  • [9] Fast Algorithm for Approximate k-Nearest Neighbor Graph Construction3
    Wang, Dilin
    Shi, Lei
    Cao, Jianwen
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 349 - 356
  • [10] k-Nearest Neighbour Classifiers - A Tutorial
    Cunningham, Padraig
    Delany, Sarah Jane
    ACM COMPUTING SURVEYS, 2021, 54 (06)