Adaptive Graph K-Means

被引:0
作者
Pei, Shenfei [1 ]
Sun, Yuanchen [2 ]
Nie, Feiping [3 ]
Jiang, Xudong [4 ]
Zheng, Zengwei [1 ]
机构
[1] Hangzhou City Univ, Sch Comp & Comp Sci, Hangzhou 310015, Zhejiang, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
[3] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China
[4] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
Machine learning; Clustering; Graph-based; k-means; Computational efficiency;
D O I
10.1016/j.patcog.2024.111226
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering large-scale datasets has received increasing attention recently. However, existing algorithms are still not efficient in scenarios with extremely large number of clusters. To this end, Adaptive Graph K-Means (AGKM) is proposed in this work. Its idea originates from k-means, but it operates on an adaptive k-Nearest Neighbor (k-NN) graph instead of data features. First, AGKM is highly efficient for processing datasets where both the numbers of samples and clusters are very large. Specifically, the time and space complexity are both linear w.r.t the number of samples and, more importantly, independent to the cluster number. Second, AGKM is designed for balanced clusters. This constraint is realized by adding a regularization term in loss function, and a simple modification of the graph in optimization algorithm, which does not increase the computational burden. Last, the indicator and dissimilarity matrices are learned simultaneously, so that the proposed AGKM obtains the final partition directly with higher efficacy and efficiency. Experiments on several datasets validate the advantages of AGKM. In particular, over 29X and 46X speed-ups with respect to k-means are observed on the two large-scale datasets WebFace and CelebA, respectively.
引用
收藏
页数:11
相关论文
共 50 条
[31]   An adaptive enhancement algorithm for infrared video based on modified k-means clustering [J].
Zhang, Linze ;
Wang, Jingqi ;
Wu, Wen .
INFRARED SENSORS, DEVICES, AND APPLICATIONS VI, 2016, 9974
[32]   Global k-means plus plus : an effective relaxation of the global k-means clustering algorithm [J].
Vardakas, Georgios ;
Likas, Aristidis .
APPLIED INTELLIGENCE, 2024, 54 (19) :8876-8888
[33]   An Efficient MapReduce-based Adaptive K-Means Clustering for Large Dataset [J].
Chowdhury, Tapan ;
Mukherjee, Arijit ;
Chakraborty, Susanta .
2017 3RD IEEE INTERNATIONAL SYMPOSIUM ON NANOELECTRONIC AND INFORMATION SYSTEMS (INIS), 2017, :157-162
[34]   K-polytopes: a superproblem of k-means [J].
Yigit Oktar ;
Mehmet Turkan .
Signal, Image and Video Processing, 2019, 13 :1207-1214
[35]   K-polytopes: a superproblem of k-means [J].
Oktar, Yigit ;
Turkan, Mehmet .
SIGNAL IMAGE AND VIDEO PROCESSING, 2019, 13 (06) :1207-1214
[36]   How to Estimate K Value without Domain Knowledge in K-Means [J].
Shin, Wonsup ;
Cho, Kuk-Hyun ;
Kim, Jung-Jae .
2017 3RD INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2017, :701-704
[37]   In Search of Star Clusters: An Introduction to the K-Means Algorithm [J].
Ferreira Nascimento, Marcio Luis .
JOURNAL OF HUMANISTIC MATHEMATICS, 2022, 12 (01) :243-255
[38]   Enhancing the K-means Algorithm Using Cluster Adjustment [J].
Yamout, Fadi .
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, :307-311
[39]   A Survey on K-Means Clustering for Analyzing Variation in Data [J].
Patil, Pratik ;
Karthikeyan, A. .
INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES, ICICCT 2019, 2020, 89 :317-323
[40]   Using Annealing to Accelerate Triangle Inequality k-means [J].
Zhakubayev, Alibek ;
Hamerly, Greg .
2024 IEEE 11TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, DSAA 2024, 2024, :128-136