Granular-ball computing-based manifold clustering algorithms for ultra-scalable data

被引:13
作者
Cheng, Dongdong [1 ,2 ,3 ,4 ,5 ]
Liu, Shushu [2 ,3 ,4 ]
Xia, Shuyin [2 ,3 ,4 ]
Wang, Guoyin [2 ,3 ,4 ]
机构
[1] Yangtze Normal Univ, Coll Big Data & Intelligent Engn, Chongqing 408100, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Computat Intelligence, Chongqing 400065, Peoples R China
[3] Chongqing Univ Posts & Telecommun, Key Lab Cyberspace Big Data Intelligent Secur, Minist Educ, Chongqing 400065, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Key Lab Big Data Intelligent Comp, Chongqing 400065, Peoples R China
[5] Yangtze Normal Univ, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
Granular-ball; Manifold learning; Ultra-scalable data; Spectral clustering; Ensemble clustering; PARTITIONS; REDUCTION;
D O I
10.1016/j.eswa.2024.123313
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Manifold learning is essential for analyzing high -dimensional data, but it suffers from high time complexity. To address this, researchers proposed using anchors and constructing a similarity matrix to expedite eigen decomposition and reduce sparse consumption. However, randomly selected anchors fail to represent the data well, and using K -means for anchor generation is time-consuming. In this paper, we introduce Granular -ball (GB) into unsupervised manifold learning, presenting GB-USC and GB-USEC. By employing a coarse -to -fine approach, GB-USC generates high -quality anchors aligned with the data distribution. A bipartite graph is constructed between data points and anchors, enabling low -dimensional manifold embedding using transfer cut. GB-USEC combines multiple GB-USC clusters, generating consistent low -dimensional embeddings across dimensions and determining clustering results through voting. The experimental results show that compared with the state-of-the-art algorithm U -SPEC, GB-USC achieves the similar performance with the average running time of GB-USC is 33.96% less than that of U -SPEC for several million -level datasets. Additionally, our ensemble algorithm improves the clustering efficiency by an average of 29.19% compared with U-SENC.
引用
收藏
页数:13
相关论文
共 44 条
[1]   Cumulative voting consensus method for partitions with a variable number of clusters [J].
Ayad, Hanan G. ;
Kamel, Mohamed S. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (01) :160-173
[2]  
Belkin M, 2002, ADV NEUR IN, V14, P585
[3]   Salient Object Detection: A Benchmark [J].
Borji, Ali ;
Cheng, Ming-Ming ;
Jiang, Huaizu ;
Li, Jia .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722
[4]  
Cai D., 2007, Proceedings of the 15th international conference on Multimedia, P403, DOI DOI 10.1145/1291233.1291329
[5]   Large Scale Spectral Clustering Via Landmark-Based Sparse Representation [J].
Cai, Deng ;
Chen, Xinlei .
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (08) :1669-1680
[6]   Parallel Spectral Clustering in Distributed Systems [J].
Chen, Wen-Yen ;
Song, Yangqiu ;
Bai, Hongjie ;
Lin, Chih-Jen ;
Chang, Edward Y. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (03) :568-586
[7]   A Novel Approximate Spectral Clustering Algorithm With Dense Cores and Density Peaks [J].
Cheng, Dongdong ;
Huang, Jinlong ;
Zhang, Sulan ;
Zhang, Xiaohua ;
Luo, Xin .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (04) :2348-2360
[8]   Clustering with Local Density Peaks-Based Minimum Spanning Tree [J].
Cheng, Dongdong ;
Zhu, Qingsheng ;
Huang, Jinlong ;
Wu, Quanwang ;
Yang, Lijun .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (02) :374-387
[9]  
Cristofor D, 2002, J UNIVERS COMPUT SCI, V8, P153
[10]  
Ester Martin., 1996, P ACM SIGKDD INT C K, V96, P226, DOI DOI 10.5555/3001460.3001507