Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

被引:68
|
作者
Koga, Hisashi [1 ]
Ishibashi, Tetsuo [1 ]
Watanabe, Toshinori [1 ]
机构
[1] Univ Electrocommun, Grad Sch Informat Syst, Tokyo 1828585, Japan
关键词
hierarchical clustering; single linkage method; Locality-Sensitive Hashing; outlier removal;
D O I
10.1007/s10115-006-0027-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The single linkage method is a fundamental agglomerative hierarchical clustering algorithm. This algorithm regards each point as a single cluster initially. In the agglomeration step, it connects a pair of clusters such that the distance between the nearest members is the shortest. This step is repeated until only one cluster remains. The single linkage method can efficiently detect clusters in arbitrary shapes. However, a drawback of this method is a large time complexity of O(n(2)), where n represents the number of data points. This time complexity makes this method infeasible for large data. This paper proposes a fast approximation algorithm for the single linkage method. Our algorithm reduces the time complexity to O(nB) by rapidly finding the near clusters to be connected by Locality-Sensitive Hashing, a fast algorithm for the approximate nearest neighbor search. Here, B represents the maximum number of points going into a single hash entry and it practically diminishes to a small constant as compared to n for sufficiently large hash tables. Experimentally, we show that (1) the proposed algorithm obtains clustering results similar to those obtained by the single linkage method and (2) it runs faster for large data than the single linkage method.
引用
收藏
页码:25 / 53
页数:29
相关论文
共 50 条
  • [22] A Novel Cluster Prediction Approach Based on Locality-Sensitive Hashing for Fuzzy Clustering of Categorical Data
    Toan Nguyen Mau
    Inoguchi, Yasushi
    Van-Nam Huynh
    IEEE ACCESS, 2022, 10 : 34196 - 34206
  • [23] Fast Duplicate Detection Using Locality Sensitive Hashing
    Rong, C. T.
    Feng, L. J.
    INTERNATIONAL CONFERENCE ON ADVANCED EDUCATIONAL TECHNOLOGY AND INFORMATION ENGINEERING (AETIE 2015), 2015, : 580 - 588
  • [24] An improved method of locality-sensitive hashing for scalable instance matching
    Mehmet Aydar
    Serkan Ayvaz
    Knowledge and Information Systems, 2019, 58 : 275 - 294
  • [25] On the Problem of p1-1 in Locality-Sensitive Hashing
    Ahle, Thomas Dybdahl
    SIMILARITY SEARCH AND APPLICATIONS, SISAP 2020, 2020, 12440 : 85 - 93
  • [26] Locality-Sensitive Hashing for Finding Nearest Neighbors in Probability Distributions
    Tang, Yi-Kun
    Mao, Xian-Ling
    Hao, Yi-Jing
    Xu, Cheng
    Huang, Heyan
    SOCIAL MEDIA PROCESSING, SMP 2017, 2017, 774 : 3 - 15
  • [27] Can LSH (locality-sensitive hashing) be replaced by neural network?
    Liu, Renyang
    Zhao, Jun
    Chu, Xing
    Liang, Yu
    Zhou, Wei
    He, Jing
    SOFT COMPUTING, 2024, 28 (02) : 887 - 902
  • [28] Locality-Sensitive Hashing for Efficient Web Application Security Testing
    Ben-Bassat, Ilan
    Rokah, Erez
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY (ICISSP), 2019, : 193 - 204
  • [29] Can LSH (locality-sensitive hashing) be replaced by neural network?
    Renyang Liu
    Jun Zhao
    Xing Chu
    Yu Liang
    Wei Zhou
    Jing He
    Soft Computing, 2024, 28 : 1041 - 1053
  • [30] Locality-Sensitive Hashing for Efficient Rendezvous Search: A New Approach
    Jiang, Guann-Yng
    Chang, Cheng-Shang
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (09) : 5674 - 5687