K-Means Extensions for Clustering Categorical Data

被引:0
作者
Alwersh, Mohammed [1 ]
Kovacs, Laszlo [1 ]
机构
[1] Univ Miskolc, Dept Informat Technol, Miskolc, Hungary
关键词
Clustering algorithms; categorical data; k-means; cluster analysis; formal concept analysis; concept lattice; DISSIMILARITY MEASURE; ALGORITHM;
D O I
10.14569/IJACSA.2023.0140953
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
knowledge discovery, representing data relationships through concept lattices. However, the complexity of these lattices often hinders interpretation, prompting the need for innovative solutions. In this context, the study proposes clustering formal concepts within a concept lattice, ultimately aiming to minimize lattice size. To address this, The study introduces introduce two novel extensions of the k-means algorithm to handle categorical data efficiently, a crucial aspect of the FCA framework. These extensions, namely K-means Dijkstra on Lattice (KDL) and Kmeans Vector on Lattice (KVL), are designed to minimize the concept lattice size. However, the current study focuses on introducing and refining these new methods, laying the groundwork for our future goal of lattice size reduction. The KDL utilizes FCA to build a graph of formal concepts and their relationships, applying a modified Dijkstra algorithm for distance measurement, thus replacing the Euclidean distance in traditional k-means. The defined centroids are formal concepts with minimal intra-cluster distances, enabling effective categorical data clustering. In contrast, the KVL extension transforms formal concepts into numerical vectors to leverage the scalability offered by traditional k-means, potentially at the cost of clustering quality due to oversight of the data's inherent hierarchy. After rigorous testing, KDL and KVL proved robust in managing categorical data. The introduction and demonstration of these novel techniques lay the groundwork for future research, marking a significant stride toward addressing current challenges in categorical data clustering within the FCA framework.
引用
收藏
页码:492 / 507
页数:16
相关论文
共 26 条
[1]  
Abiy T., 2016, Dijkstra's shortest path algorithm
[2]  
Alwersh M, 2023, Indonesian Journal of Electrical Engineering and Computer Science, V30, P366, DOI 10.11591/ijeecs.v30.i1.pp366-387
[3]  
Baixeries J, 2009, LECT NOTES ARTIF INT, V5548, P162
[4]  
Bellman Richard., 1958, Quarterly of Applied Mathematics, V16, P87, DOI DOI 10.1090/QAM/102435
[5]   A dissimilarity measure for the k-Modes clustering algorithm [J].
Cao, Fuyuan ;
Liang, Jiye ;
Li, Deyu ;
Bai, Liang ;
Dang, Chuangyin .
KNOWLEDGE-BASED SYSTEMS, 2012, 26 :120-127
[6]  
Chen L., 2013, 23 INT JOINT C ART I
[7]   ALGORITHM-97 - SHORTEST PATH [J].
FLOYD, RW .
COMMUNICATIONS OF THE ACM, 1962, 5 (06) :345-345
[8]  
Ganter Bernhard, 1984, PREPRINT, DOI [10.1007/978-3-642-11928-6_22, DOI 10.1007/978-3-642-11928-6_22]
[9]  
Ganter R., 1999, Formal concept analysis: Mathematical foundations, DOI [10.1007/978-3-642-59830-2, DOI 10.1007/978-3-642-59830-2]
[10]  
Ganti Venkatesh., 1999, INT C KNOWLEDGE DISC, P73, DOI [10.1145/312129.312201, DOI 10.1145/312129.312201]