cs-means: Determining optimal number of clusters based on a level-of-similarity

被引:1
作者
Lamsal, Rabindra [1 ]
Katiyar, Shubham [1 ]
机构
[1] Jawaharlal Nehru Univ, Sch Comp & Syst Sci, New Delhi 110067, India
来源
SN APPLIED SCIENCES | 2020年 / 2卷 / 11期
关键词
Unsupervised; Clustering; Centroid-based;
D O I
10.1007/s42452-020-03582-5
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper proposes a centroid-based clustering algorithm, cs-means, which is capable of clustering data-points with n-features, without having to specify the number of clusters to be formed. The core logic behind the algorithm is a similarity measure that collectively decides whether to assign an incoming data-point to a pre-existing cluster, or create a new cluster and assign the data-point to it. The algorithm is application-specific and applicable when the need is to perform clustering analysis of a stream of data-points, where the similarity measure between an incoming data-point and the cluster to which the data-point is to be associated with, is higher than the predefined level-of-similarity (cluster strictness). The algorithm was experimented on 4 public datasets and 10 isotropic Gaussian blobs. The cluster analysis strongly confirms the objectives of the proposed clustering algorithm.
引用
收藏
页数:9
相关论文
共 17 条
[1]  
Aggarwal C. C., 2003, P 2003 VLDB C, V29, P81
[2]  
Angelov P, 2014, 2014 IEEE SYMPOSIUM ON EVOLVING AND AUTONOMOUS LEARNING SYSTEMS (EALS), P1, DOI 10.1109/EALS.2014.7009497
[3]  
Bezerra CG, 2016, IEEE CONF EVOL ADAPT, P162
[4]  
Chen YX, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P133
[5]   Automatic clustering using an improved differential evolution algorithm [J].
Das, Swagatam ;
Abraham, Ajith ;
Konar, Amit .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2008, 38 (01) :218-237
[6]  
Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226
[7]   Clustering data streams [J].
Guha, S ;
Mishra, N ;
Motwani, R ;
O'Callaghan, L .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :359-366
[8]  
Hartigan J. A., 1979, Applied Statistics, V28, P100, DOI 10.2307/2346830
[9]   A clustering algorithm based on graph connectivity [J].
Hartuv, E ;
Shamir, R .
INFORMATION PROCESSING LETTERS, 2000, 76 (4-6) :175-181
[10]  
Kuo R. J, 2013, PROC I IND ENG ASIAN, P1207