K-Cosine-Means Clustering Algorithm

被引：6

作者：

Khan, Md Kafi ^{[1
]}

Sarker, Sakil ^{[1
]}

Ahmed, Syed Mahmud ^{[1
]}

Khan, Mozammel H. A. ^{[1
]}

机构：

[1] East West Univ, Dept Comp Sci & Engn, Dhaka 1212, Bangladesh

来源：

PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND INFORMATION TECHNOLOGY 2021 (ICECIT 2021) | 2021年

关键词：

Centroids initialization; Cosine similarity; K-Cosine-Means clustering; K-Means clustering; Unsupervised learning;

D O I：

10.1109/ICECIT54077.2021.9641480

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

K-means algorithm is a clustering algorithm that is one of the most widely used unsupervised techniques in data mining. This paper presents an extension of K-means algorithms named K-cosine-means algorithm. While the K-means algorithm initializes the centroids randomly and uses the Euclidean distance measure to assign data points to clusters, our proposed algorithm inherits a systematic approach from K-means++ to initialize the centroids and utilizes Cosine similarity to assign data points to clusters. We have performed experiments on both homogeneous datasets (Iris and Seeds datasets) and heterogeneous dataset (Hepatitis dataset). From experimental results, we have observed better clustering accuracy on homogeneous datasets compared to other variants of the K -means algorithm, namely, K-means, 1K-means, K-means++, WK-means, MWK-means, iWK-means, and iMWK-means. However, for heterogeneous dataset, we have observed better clustering accuracy compared to standard K-means, K-means++, and iK-means algorithms.

引用

页数：4

共 19 条

[1] Abbas O. A., 2008, INT ARAB LOURNAL INF, V5
[2] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[3] A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA
BALL, GH
HALL, DJ
[J]. BEHAVIORAL SCIENCE, 1967, 12 (02): : 153 - &
[4] An optimization algorithm for clustering using weighted dissimilarity measures
Chan, EY
Ching, WK
Ng, MK
Huang, JZ
[J]. PATTERN RECOGNITION, 2004, 37 (05) : 943 - 952
[5] Charytanowicz M, 2010, ADV INTEL SOFT COMPU, V69, P15
[6] Cordeiro de Amorim R., 2012, PARTITIONAL CLUSTERI
[7] de Amorim RC, 2012, INT SYMP COMP INTELL, P13, DOI 10.1109/CINTI.2012.6496753
[8] Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering
de Amorim, Renato Cordeiro
Mirkin, Boris
[J]. PATTERN RECOGNITION, 2012, 45 (03) : 1061 - 1075
[9] COMPUTER-INTENSIVE METHODS IN STATISTICS
DIACONIS, P
EFRON, B
[J]. SCIENTIFIC AMERICAN, 1983, 248 (05) : 116 - &
[10] The use of multiple measurements in taxonomic problems
Fisher, RA
[J]. ANNALS OF EUGENICS, 1936, 7 : 179 - 188

← 1 2 →