An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood

被引:72
作者
Ding, Shifei [1 ]
Du, Mingjing [1 ]
Sun, Tongfeng [1 ]
Xu, Xiao [1 ]
Xue, Yu [2 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing 210044, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Entropy; Density peaks clustering; Mixed type data; Fuzzy neighborhood; SIMILARITY;
D O I
10.1016/j.knosys.2017.07.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most clustering algorithms rely on the assumption that data simply contains numerical values. In fact, however, data sets containing both numerical and categorical attributes are ubiquitous in real-world tasks, and effective grouping of such data is an important yet challenging problem. Currently most algorithms are sensitive to initialization and are generally unsuitable for non-spherical distribution data. For this, we propose an entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood (DP-MD-FN). Firstly, we propose a new similarity measure for either categorical or numerical attributes which has a uniform criterion. The similarity measure is proposed to avoid feature transformation and parameter adjustment between categorical and numerical values. We integrate this entropy based strategy with the density peaks clustering method. In addition, to improve the robustness of the original algorithm, we use fuzzy neighborhood relation to redefine the local density. Besides, in order to select the cluster centers automatically, a simple determination strategy is developed through introducing the gamma-graph. This method can deal with three types of data: numerical, categorical, and mixed type data. We compare the performance of our algorithm with traditional clustering algorithms, such as K-Modes, K-Prototypes, KL-FCM-GM, EKP and OCIL. Experiments on different benchmark data sets demonstrate the effectiveness and robustness of the proposed algorithm. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:294 / 313
页数:20
相关论文
共 48 条
  • [11] A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms
    Derrac, Joaquin
    Garcia, Salvador
    Molina, Daniel
    Herrera, Francisco
    [J]. SWARM AND EVOLUTIONARY COMPUTATION, 2011, 1 (01) : 3 - 18
  • [12] Wavelet twin support vector machines based on glowworm swarm optimization
    Ding, Shifei
    An, Yuexuan
    Zhang, Xiekai
    Wu, Fulin
    Xue, Yu
    [J]. NEUROCOMPUTING, 2017, 225 : 157 - 163
  • [13] Study on density peaks clustering based on k-nearest neighbors and principal component analysis
    Du, Mingjing
    Ding, Shifei
    Jia, Hongjie
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 99 : 135 - 145
  • [14] Evolutionary computation in China: A literature survey
    Gong, Maoguo
    Wang, Shanfeng
    Liu, Wenfeng
    Yan, Jianan
    Jiao, Licheng
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2016, 1 (04) : 334 - +
  • [15] Black hole: A new heuristic optimization approach for data clustering
    Hatamlou, Abdolreza
    [J]. INFORMATION SCIENCES, 2013, 222 : 175 - 184
  • [16] Generalizing self-organizing map for categorical data
    Hsu, CC
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2006, 17 (02): : 294 - 304
  • [17] Hierarchical clustering of mixed data based on distance hierarchy
    Hsu, Chung-Chian
    Chen, Chin-Long
    Su, Yu-Wei
    [J]. INFORMATION SCIENCES, 2007, 177 (20) : 4474 - 4492
  • [18] A link density clustering algorithm based on automatically selecting density peaks for overlapping community detection
    Huang, Lan
    Wang, Guishen
    Wang, Yan
    Pang, Wei
    Ma, Qin
    [J]. INTERNATIONAL JOURNAL OF MODERN PHYSICS B, 2016, 30 (24):
  • [19] Huang Z., 1997, Dmkd, V3, P34
  • [20] Extensions to the k-means algorithm for clustering large data sets with categorical values
    Huang, ZX
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (03) : 283 - 304