Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

被引:68
|
作者
Koga, Hisashi [1 ]
Ishibashi, Tetsuo [1 ]
Watanabe, Toshinori [1 ]
机构
[1] Univ Electrocommun, Grad Sch Informat Syst, Tokyo 1828585, Japan
关键词
hierarchical clustering; single linkage method; Locality-Sensitive Hashing; outlier removal;
D O I
10.1007/s10115-006-0027-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The single linkage method is a fundamental agglomerative hierarchical clustering algorithm. This algorithm regards each point as a single cluster initially. In the agglomeration step, it connects a pair of clusters such that the distance between the nearest members is the shortest. This step is repeated until only one cluster remains. The single linkage method can efficiently detect clusters in arbitrary shapes. However, a drawback of this method is a large time complexity of O(n(2)), where n represents the number of data points. This time complexity makes this method infeasible for large data. This paper proposes a fast approximation algorithm for the single linkage method. Our algorithm reduces the time complexity to O(nB) by rapidly finding the near clusters to be connected by Locality-Sensitive Hashing, a fast algorithm for the approximate nearest neighbor search. Here, B represents the maximum number of points going into a single hash entry and it practically diminishes to a small constant as compared to n for sufficiently large hash tables. Experimentally, we show that (1) the proposed algorithm obtains clustering results similar to those obtained by the single linkage method and (2) it runs faster for large data than the single linkage method.
引用
收藏
页码:25 / 53
页数:29
相关论文
共 50 条
  • [41] Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket
    Fang, Bo
    Hua, Zhongyun
    Huang, Hejiao
    14TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2019), 2019, : 5 - 10
  • [42] VBLSH: Volume-balancing locality-sensitive hashing algorithm for K-nearest neighbors search
    Zhang, Shi
    Lai, Huixia
    Chen, Weilin
    Zhang, Lulu
    Lin, Xinhong
    Xiao, Ruliang
    INFORMATION SCIENCES, 2022, 587 : 774 - 793
  • [43] A Trajectory-Oriented Locality-Sensitive Hashing Method for User Identification
    Li, Yongjun
    Li, Xiangyu
    Ji, Wenli
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (06) : 2343 - 2356
  • [44] Improved locality-sensitive hashing method for the approximate nearest neighbor problem
    Lu Ying-Hua
    Ma Ting-Huai
    Zhong Shui-Ming
    Cao Jie
    Wang Xin
    Al-Dhelaan, Abdullah
    CHINESE PHYSICS B, 2014, 23 (08)
  • [45] Subspace k-anonymity algorithm for location-privacy preservation based on locality-sensitive hashing
    Wang, Xiaohan
    Luo, Yonglong
    Liu, Shiyang
    Wang, Taochun
    Han, Huihui
    INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 1167 - 1185
  • [46] An Approach for Fast Hierarchical Agglomerative Clustering Using Graphics Processors with CUDA
    Shalom, S. A. Arul
    Dash, Manoranjan
    Tue, Minh
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 35 - +
  • [47] Efficient locality-sensitive hashing over high-dimensional streaming data
    Wang, Hao
    Yang, Chengcheng
    Zhang, Xiangliang
    Gao, Xin
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (05) : 3753 - 3766
  • [48] Privacy-preserving Distributed Service Recommendation based on Locality-Sensitive Hashing
    Qi, Lianyong
    Xiang, Haolong
    Dou, Wanchun
    Yang, Chi
    Qin, Yongrui
    Zhang, Xuyun
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2017), 2017, : 49 - 56
  • [49] Query-aware locality-sensitive hashing scheme for l p norm
    Huang, Qiang
    Feng, Jianlin
    Fang, Qiong
    Ng, Wilfred
    Wang, Wei
    VLDB JOURNAL, 2017, 26 (05) : 683 - 708
  • [50] A Graph Classification Method Based on Support Vector Machines and Locality-Sensitive Hashing
    Gonzalez-Lima, Maria D.
    Ludena, Carenne C.
    Otazo-Sanchez, Gibran G.
    IEEE ACCESS, 2024, 12 : 15791 - 15799