cs-means: Determining optimal number of clusters based on a level-of-similarity

被引：1

作者：

Lamsal, Rabindra ^{[1
]}

Katiyar, Shubham ^{[1
]}

机构：

[1] Jawaharlal Nehru Univ, Sch Comp & Syst Sci, New Delhi 110067, India

来源：

SN APPLIED SCIENCES | 2020年 / 2卷 / 11期

关键词：

Unsupervised; Clustering; Centroid-based;

D O I：

10.1007/s42452-020-03582-5

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

This paper proposes a centroid-based clustering algorithm, cs-means, which is capable of clustering data-points with n-features, without having to specify the number of clusters to be formed. The core logic behind the algorithm is a similarity measure that collectively decides whether to assign an incoming data-point to a pre-existing cluster, or create a new cluster and assign the data-point to it. The algorithm is application-specific and applicable when the need is to perform clustering analysis of a stream of data-points, where the similarity measure between an incoming data-point and the cluster to which the data-point is to be associated with, is higher than the predefined level-of-similarity (cluster strictness). The algorithm was experimented on 4 public datasets and 10 isotropic Gaussian blobs. The cluster analysis strongly confirms the objectives of the proposed clustering algorithm.

引用

页数：9

共 17 条

[1]

Aggarwal C. C., 2003, P 2003 VLDB C, V29, P81

[2]

Angelov P, 2014, 2014 IEEE SYMPOSIUM ON EVOLVING AND AUTONOMOUS LEARNING SYSTEMS (EALS), P1, DOI 10.1109/EALS.2014.7009497

[3]

Bezerra CG, 2016, IEEE CONF EVOL ADAPT, P162

[4]

Chen YX, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P133

[5] Automatic clustering using an improved differential evolution algorithm [J].

Das, Swagatam ;

Abraham, Ajith ;

Konar, Amit .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2008, 38 (01) :218-237

[6]

Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226

[7] Clustering data streams [J].

Guha, S ;

Mishra, N ;

Motwani, R ;

O'Callaghan, L .

41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :359-366

[8]

Hartigan J. A., 1979, Applied Statistics, V28, P100, DOI 10.2307/2346830

[9] A clustering algorithm based on graph connectivity [J].

Hartuv, E ;

Shamir, R .

INFORMATION PROCESSING LETTERS, 2000, 76 (4-6) :175-181

[10]

Kuo R. J, 2013, PROC I IND ENG ASIAN, P1207

← 1 2 →