Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

被引:68
|
作者
Koga, Hisashi [1 ]
Ishibashi, Tetsuo [1 ]
Watanabe, Toshinori [1 ]
机构
[1] Univ Electrocommun, Grad Sch Informat Syst, Tokyo 1828585, Japan
关键词
hierarchical clustering; single linkage method; Locality-Sensitive Hashing; outlier removal;
D O I
10.1007/s10115-006-0027-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The single linkage method is a fundamental agglomerative hierarchical clustering algorithm. This algorithm regards each point as a single cluster initially. In the agglomeration step, it connects a pair of clusters such that the distance between the nearest members is the shortest. This step is repeated until only one cluster remains. The single linkage method can efficiently detect clusters in arbitrary shapes. However, a drawback of this method is a large time complexity of O(n(2)), where n represents the number of data points. This time complexity makes this method infeasible for large data. This paper proposes a fast approximation algorithm for the single linkage method. Our algorithm reduces the time complexity to O(nB) by rapidly finding the near clusters to be connected by Locality-Sensitive Hashing, a fast algorithm for the approximate nearest neighbor search. Here, B represents the maximum number of points going into a single hash entry and it practically diminishes to a small constant as compared to n for sufficiently large hash tables. Experimentally, we show that (1) the proposed algorithm obtains clustering results similar to those obtained by the single linkage method and (2) it runs faster for large data than the single linkage method.
引用
收藏
页码:25 / 53
页数:29
相关论文
共 50 条
  • [31] Secure Approximate Nearest Neighbor Search with Locality-Sensitive Hashing
    Song, Shang
    Liu, Lin
    Chen, Rongmao
    Peng, Wei
    Wang, Yi
    COMPUTER SECURITY - ESORICS 2023, PT III, 2024, 14346 : 411 - 430
  • [32] Scalable Graph Representation Learning via Locality-Sensitive Hashing
    Chen, Xiusi
    Jiang, Jyun-Yu
    Wang, Wei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3878 - 3882
  • [33] An improved method of locality-sensitive hashing for scalable instance matching
    Aydar, Mehmet
    Ayvaz, Serkan
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 58 (02) : 275 - 294
  • [34] Speeding up probabilistic roadmap planners with locality-sensitive hashing
    Rantanen, Mika T.
    Juhola, Martti
    ROBOTICA, 2015, 33 (07) : 1491 - 1506
  • [35] Batch-Orthogonal Locality-Sensitive Hashing for Angular Similarity
    Ji, Jianqiu
    Yan, Shuicheng
    Li, Jianmin
    Gao, Guangyu
    Tian, Qi
    Zhang, Bo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (10) : 1963 - 1974
  • [36] MinIsoClust: Isoform clustering using minhash and locality sensitive hashing
    Behera, Sairam
    Deogun, Jitender S.
    Moriyama, Etsuko N.
    ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [37] An incremental community detection method for social tagging systems using locality-sensitive hashing
    Wu, Zhenyu
    Zou, Ming
    NEURAL NETWORKS, 2014, 58 : 14 - 28
  • [38] Locality-Sensitive Hashing for Information Retrieval System on Multiple GPGPU Devices
    Toan Nguyen Mau
    Inoguchi, Yasushi
    APPLIED SCIENCES-BASEL, 2020, 10 (07):
  • [39] Fast and Accurate Workload Characterization Using Locality Sensitive Hashing
    Islam, Mohammad Shahedul
    Gibson, Matt
    Muzahid, Abdullah
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 1192 - 1201
  • [40] Improved locality-sensitive hashing method for the approximate nearest neighbor problem
    陆颖华
    马廷淮
    钟水明
    曹杰
    王新
    Abdullah Al-Dhelaane
    Chinese Physics B, 2014, (08) : 221 - 229