NNB: An Efficient Nearest Neighbor Search Method for Hierarchical Clustering on Large Datasets

被引:0
作者
Zhang, Wei [1 ,2 ]
Zhang, Gongxuan [1 ]
Wang, Yongli [1 ]
Zhu, Zhaomeng [1 ]
Li, Tao [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Jiangsu, Peoples R China
[2] Huaiyin Normal Univ, Sch Comp Sci & Technol, Huaian 223300, Peoples R China
来源
2015 IEEE 9TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC) | 2015年
关键词
Hierarchical clustering; nearest neighbor boundary; parallel and distributed computing; MapReduce;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nearest neighbor search is a key technique used in hierarchical clustering. The time complexity of standard agglomerative hierarchical clustering is O(n(3)), while the time complexity of more advanced hierarchical clustering algorithms (such as nearest neighbor chain) is O(n(2)). This paper presents a new nearest neighbor search method called nearest neighbor boundary(NNB), which first divides a large dataset into independent subsets and then finds nearest neighbor of each point in the subsets. When NNB is used, the time complexity of hierarchical clustering can be reduced to O(n log(2) n). Based on NNB, we propose a fast hierarchical clustering algorithm called nearest-neighbor boundary clustering(NBC), and the proposed algorithm can also be adapted to the parallel and distributed computing frameworks. The experimental results demonstrate that our proposal algorithm is practical for large datasets.
引用
收藏
页码:405 / 412
页数:8
相关论文
共 21 条
  • [1] Abbas M. A., 2012, 2012 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA), P1192, DOI 10.1109/ISSPA.2012.6310472
  • [2] [Anonymous], COMPUTATIONAL GEOMET
  • [3] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING
    BENTLEY, JL
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (09) : 509 - 517
  • [4] Parallel Spectral Clustering in Distributed Systems
    Chen, Wen-Yen
    Song, Yangqiu
    Bai, Hongjie
    Lin, Chih-Jen
    Chang, Edward Y.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (03) : 568 - 586
  • [5] EFFICIENT ALGORITHMS FOR AGGLOMERATIVE HIERARCHICAL-CLUSTERING METHODS
    DAY, WHE
    EDELSBRUNNER, H
    [J]. JOURNAL OF CLASSIFICATION, 1984, 1 (01) : 7 - 24
  • [6] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [7] EFFICIENT ALGORITHM FOR A COMPLETE LINK METHOD
    DEFAYS, D
    [J]. COMPUTER JOURNAL, 1977, 20 (04) : 364 - 366
  • [8] Ding C, 2005, LECT NOTES ARTIF INT, V3721, P71
  • [9] Finkel R. A., 1974, Acta Informatica, V4, P1, DOI 10.1007/BF00288933
  • [10] Guha S., 1998, CURE, P73, DOI DOI 10.1145/276305.276312