Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

被引:68
|
作者
Koga, Hisashi [1 ]
Ishibashi, Tetsuo [1 ]
Watanabe, Toshinori [1 ]
机构
[1] Univ Electrocommun, Grad Sch Informat Syst, Tokyo 1828585, Japan
关键词
hierarchical clustering; single linkage method; Locality-Sensitive Hashing; outlier removal;
D O I
10.1007/s10115-006-0027-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The single linkage method is a fundamental agglomerative hierarchical clustering algorithm. This algorithm regards each point as a single cluster initially. In the agglomeration step, it connects a pair of clusters such that the distance between the nearest members is the shortest. This step is repeated until only one cluster remains. The single linkage method can efficiently detect clusters in arbitrary shapes. However, a drawback of this method is a large time complexity of O(n(2)), where n represents the number of data points. This time complexity makes this method infeasible for large data. This paper proposes a fast approximation algorithm for the single linkage method. Our algorithm reduces the time complexity to O(nB) by rapidly finding the near clusters to be connected by Locality-Sensitive Hashing, a fast algorithm for the approximate nearest neighbor search. Here, B represents the maximum number of points going into a single hash entry and it practically diminishes to a small constant as compared to n for sufficiently large hash tables. Experimentally, we show that (1) the proposed algorithm obtains clustering results similar to those obtained by the single linkage method and (2) it runs faster for large data than the single linkage method.
引用
收藏
页码:25 / 53
页数:29
相关论文
共 50 条
  • [1] Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
    Hisashi Koga
    Tetsuo Ishibashi
    Toshinori Watanabe
    Knowledge and Information Systems, 2007, 12 : 25 - 53
  • [2] An Improved Algorithm for Locality-Sensitive Hashing
    Cen, Wei
    Miao, Kehua
    10TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2015), 2015, : 61 - 64
  • [3] Fast Redescription Mining Using Locality-Sensitive Hashing
    Karjalainen, Maiju
    Galbrun, Esther
    Miettinen, Pauli
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT VII, ECML PKDD 2024, 2024, 14947 : 124 - 142
  • [4] An adaptive mean shift clustering algorithm based on locality-sensitive hashing
    Zhang, Xinhong
    Cui, Yanbin
    Li, Duoyi
    Liu, Xianxing
    Zhang, Fan
    OPTIK, 2012, 123 (20): : 1891 - 1894
  • [5] Using Locality-sensitive Hashing for Rendezvous Search
    Jiang, Guann-Yng
    Chang, Cheng-Shang
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 1743 - 1749
  • [6] Kernelized Locality-Sensitive Hashing
    Kulis, Brian
    Grauman, Kristen
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (06) : 1092 - 1104
  • [7] Hardware acceleration of k-mer clustering using locality-sensitive hashing
    Soto, Javier E.
    Krohmer, Thomas
    Hernandez, Cecilia
    Figueroa, Miguel
    2019 22ND EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2019, : 659 - 662
  • [8] Fast Access for Star Catalog Based on Locality-Sensitive Hashing
    Zhu H.
    Liang B.
    Zhang T.
    Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2018, 36 (05): : 988 - 994
  • [9] Locality-sensitive Hashing scheme for Bangla News Article Clustering using Bloom Filter
    Nath, Subrata
    Singha, Pranab
    Islam, Md. Saiful
    2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION ENGINEERING (ECCE), 2017, : 17 - 21
  • [10] A privacy-preserving recommendation method with clustering and locality-sensitive hashing
    Zhang, Hanrui
    Li, Qianmu
    Xu, Jiangmin
    Meng, Shunmei
    Hou, Jun
    COMPUTATIONAL INTELLIGENCE, 2023, 39 (01) : 121 - 144