K-Cosine-Means Clustering Algorithm

被引:6
作者
Khan, Md Kafi [1 ]
Sarker, Sakil [1 ]
Ahmed, Syed Mahmud [1 ]
Khan, Mozammel H. A. [1 ]
机构
[1] East West Univ, Dept Comp Sci & Engn, Dhaka 1212, Bangladesh
来源
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND INFORMATION TECHNOLOGY 2021 (ICECIT 2021) | 2021年
关键词
Centroids initialization; Cosine similarity; K-Cosine-Means clustering; K-Means clustering; Unsupervised learning;
D O I
10.1109/ICECIT54077.2021.9641480
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
K-means algorithm is a clustering algorithm that is one of the most widely used unsupervised techniques in data mining. This paper presents an extension of K-means algorithms named K-cosine-means algorithm. While the K-means algorithm initializes the centroids randomly and uses the Euclidean distance measure to assign data points to clusters, our proposed algorithm inherits a systematic approach from K-means++ to initialize the centroids and utilizes Cosine similarity to assign data points to clusters. We have performed experiments on both homogeneous datasets (Iris and Seeds datasets) and heterogeneous dataset (Hepatitis dataset). From experimental results, we have observed better clustering accuracy on homogeneous datasets compared to other variants of the K -means algorithm, namely, K-means, 1K-means, K-means++, WK-means, MWK-means, iWK-means, and iMWK-means. However, for heterogeneous dataset, we have observed better clustering accuracy compared to standard K-means, K-means++, and iK-means algorithms.
引用
收藏
页数:4
相关论文
共 19 条
  • [1] Abbas O. A., 2008, INT ARAB LOURNAL INF, V5
  • [2] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
  • [3] A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA
    BALL, GH
    HALL, DJ
    [J]. BEHAVIORAL SCIENCE, 1967, 12 (02): : 153 - &
  • [4] An optimization algorithm for clustering using weighted dissimilarity measures
    Chan, EY
    Ching, WK
    Ng, MK
    Huang, JZ
    [J]. PATTERN RECOGNITION, 2004, 37 (05) : 943 - 952
  • [5] Charytanowicz M, 2010, ADV INTEL SOFT COMPU, V69, P15
  • [6] Cordeiro de Amorim R., 2012, PARTITIONAL CLUSTERI
  • [7] de Amorim RC, 2012, INT SYMP COMP INTELL, P13, DOI 10.1109/CINTI.2012.6496753
  • [8] Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering
    de Amorim, Renato Cordeiro
    Mirkin, Boris
    [J]. PATTERN RECOGNITION, 2012, 45 (03) : 1061 - 1075
  • [9] COMPUTER-INTENSIVE METHODS IN STATISTICS
    DIACONIS, P
    EFRON, B
    [J]. SCIENTIFIC AMERICAN, 1983, 248 (05) : 116 - &
  • [10] The use of multiple measurements in taxonomic problems
    Fisher, RA
    [J]. ANNALS OF EUGENICS, 1936, 7 : 179 - 188