Improved Clustering for Categorical Data with Genetic Algorithm

被引:1
作者
Sharma, Abha [1 ]
Thakur, R. S. [1 ]
机构
[1] Maulana Azad Natl Inst Technol, Bhopal, India
来源
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING & COMMUNICATION SYSTEMS, MCCS 2015 | 2018年 / 453卷
关键词
Clustering; Categorical data; Genetic algorithm; k-modes algorithm;
D O I
10.1007/978-981-10-5565-2_6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Clustering is the most significant unsupervised learning where the aim is to partition the data set into uniform groups called clusters. Many real-world data sets often contain categorical values, but many clustering algorithms work only on numeric values which limits its use in data mining The k-modes algorithm is one of the very effective for proper partitions of categorical data sets, though the algorithm stops at locally optimum solution as depended on initial cluster centres. Proposed algorithm utilizes the genetic algorithm (GA) to optimize the k-modes clustering algorithm. The reason is, considering noise as cluster centres gives the high cost which will not fit for the next iteration and also not gets stuck to the suboptimal solutions. The superiority of proposed algorithm is demonstrated for several real-life data sets in terms of accuracy and proves it is efficient and can reveal encouraging results especially for the large datasets.
引用
收藏
页码:67 / 76
页数:10
相关论文
共 11 条
[1]   K-modes clustering [J].
Chaturvedi, A ;
Green, PE ;
Carroll, JD .
JOURNAL OF CLASSIFICATION, 2001, 18 (01) :35-55
[2]  
Dash R, 2012, Int J Adv Comput Math Sci., V3, P257
[3]  
Han J., 2001, Data Mining: Concepts and Techniques
[4]   Extensions to the k-means algorithm for clustering large data sets with categorical values [J].
Huang, ZX .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (03) :283-304
[5]  
Kaufman L., 2005, FINDING GROUPS DATA
[6]  
Khan S. S., 2003, 2 INT C APPL ART INT
[7]   Genetic K-means algorithm [J].
Krishna, K ;
Murty, MN .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1999, 29 (03) :433-439
[8]  
Lam D., 2014, Academic Press Library in Signal Processing, P1115, DOI [10.1016/B978-0-12-396502-8.00020-6, DOI 10.1016/B978-0-12-396502-8.00020-6]
[9]  
MacQueen J, 1965, Proc of Berkeley Symposium on Mathematical Statistics Probability, P281
[10]   Genetic algorithm-based clustering technique [J].
Maulik, U ;
Bandyopadhyay, S .
PATTERN RECOGNITION, 2000, 33 (09) :1455-1465