Scalable clustering by aggregating representatives in hierarchical groups

被引:13
作者
Xie, Wen-Bo [1 ]
Liu, Zhen [2 ,3 ]
Das, Debarati [4 ]
Chen, Bin [1 ]
Srivastava, Jaideep [4 ]
机构
[1] Southwest Petr Univ, Sch Comp Sci, Chengdu 610500, Peoples R China
[2] Univ Elect Sci & Technol China, Web Sci Ctr, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[3] Univ Elect Sci & Technol China, Big Data Res Ctr, Chengdu 611731, Peoples R China
[4] Univ Minnesota, Coll Sci & Engn, Minneapolis, MN 55455 USA
关键词
Hierarchical clustering; Election tree; Representative node; Root; ALGORITHM; SEARCH; TREES;
D O I
10.1016/j.patcog.2022.109230
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Appropriately handling the scalability of clustering is a long-standing challenge for the study of clustering techniques and is of fundamental interest to researchers in the community of data mining and knowl-edge discovery. In comparison to other clustering methods, hierarchical clustering demonstrates better interpretability of clustering results but poor scalability while handling large-scale data. Thus, more com-prehensive studies on this problem need to be conducted. This paper develops a new scalable hierarchical clustering model called Election Tree, which can detect the most representative point for each sub-cluster via the process of node election in split data and adjust the members in sub-clusters by the operations of node merging and swap. Extensive experiments on real-world datasets reveal that the proposed compu-tational framework has better clustering accuracy as opposed to the competing baseline methods. Mean-while, with respect to the scalability tests on incremental synthetic datasets, the results show that the new model has a significantly lower time consumption than the state-of-the-art hierarchical clustering models such as PERCH, GRINCH, SCC and other classic baselines.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 45 条
[1]   Refining a k-nearest neighbor graph for a computationally efficient spectral clustering [J].
Alshammari, Mashaan ;
Stavrakakis, John ;
Takatsuka, Masahiro .
PATTERN RECOGNITION, 2021, 114
[2]   Experimental Comparisons of Clustering Approaches for Data Representation [J].
Anand, Sanjay Kumar ;
Kumar, Suresh .
ACM COMPUTING SURVEYS, 2023, 55 (03)
[3]   Self-supervised spectral clustering with exemplar constraints [J].
Bai, Liang ;
Zhao, Yunxiao ;
Liang, Jiye .
PATTERN RECOGNITION, 2022, 132
[4]  
Bateni M, 2017, ADV NEUR IN, V30
[5]  
BENTLEY JL, 1978, IEEE T COMPUT, V27, P97, DOI 10.1109/TC.1978.1675043
[6]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[7]  
Chami I., 2020, Advances in NeurIPS, V33, P15065
[8]   A Novel Cluster Validity Index Based on Local Cores [J].
Cheng, Dongdong ;
Zhu, Qingsheng ;
Huang, Jinlong ;
Wu, Quanwang ;
Yang, Lijun .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (04) :985-999
[9]   Scalable Differentially Private Clustering via Hierarchically Separated Trees [J].
Cohen-Addad, Vincent ;
Epasto, Alessandro ;
Lattanzi, Silvio ;
Mirrokni, Vahab ;
Medina, Andres Munoz ;
Saulpic, David ;
Schwiegelshohn, Chris ;
Vassilvitskii, Sergei .
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, :221-230
[10]   K-centroid link: a novel hierarchical clustering linkage method [J].
Dogan, Alican ;
Birant, Derya .
APPLIED INTELLIGENCE, 2022, 52 (05) :5537-5560