A novel density peaks clustering algorithm for mixed data

被引:46
|
作者
Du, Mingjing [1 ]
Ding, Shifei [1 ,2 ]
Xue, Yu [3 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing 210044, Jiangsu, Peoples R China
关键词
Data clustering; Density peaks; Entropy; Mixed data; SIMILARITY;
D O I
10.1016/j.patrec.2017.07.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The density peaks clustering (DPC) algorithm is well known for its power on non-spherical distribution data sets. However, it works only on numerical values. This prohibits it from being used to cluster real world data containing categorical values and numerical values. Traditional clustering algorithms for mixed data use a pre-processing based on binary encoding. But such methods destruct the original structure of categorical attributes. Other solutions based on simple matching, such as K-Prototypes, need a userdefined parameter to avoid favoring either type of attribute. In order to overcome these problems, we present a novel clustering algorithm for mixed data, called DPC-MD. We improve DPC by using a new similarity criterion to deal with the three types of data: numerical, categorical, or mixed data. Compared to other methods for mixed data, DPC absolutely has more advantages to deal with non-spherical distribution data. In addition, the core of the proposed method is based on a new similarity measure for mixed data. This similarity measure is proposed to avoid feature transformation and parameter adjustment. The performance of our method is demonstrated by experiments on some real-world datasets in comparison with that of traditional clustering algorithms, such as K-Modes, K-Prototypes EKP and SBAC. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:46 / 53
页数:8
相关论文
共 50 条
  • [1] A fuzzy mixed data clustering algorithm by fast search and find of density peaks
    Li, Ye
    Chen, Yiyan
    Li, Qun
    INTELLIGENT DATA ANALYSIS, 2019, 23 : S199 - S224
  • [2] A Novel Density Peaks Clustering Algorithm Based on Local Reachability Density
    Hanqing Wang
    Bin Zhou
    Jianyong Zhang
    Ruixue Cheng
    International Journal of Computational Intelligence Systems, 2020, 13 : 690 - 697
  • [3] A Novel Density Peaks Clustering Algorithm Based on Local Reachability Density
    Wang, Hanqing
    Zhou, Bin
    Zhang, Jianyong
    Cheng, Ruixue
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 690 - 697
  • [4] Clustering Mixed Data by Fast Search and Find of Density Peaks
    Liu, Shihua
    Zhou, Bingzhong
    Huang, Decai
    Shen, Liangzhong
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2017, 2017
  • [5] A novel density peaks clustering algorithm based on Hopkins statistic
    Zhang, Ruilin
    Miao, Zhenguo
    Tian, Ye
    Wang, Hongpeng
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 201
  • [6] Study on Density Peaks Clustering Algorithm of Vehicle Trajectory Data
    Jiang H.
    Lu B.
    Li A.
    Qiche Gongcheng/Automotive Engineering, 2023, 45 (07): : 1153 - 1162
  • [7] An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood
    Ding, Shifei
    Du, Mingjing
    Sun, Tongfeng
    Xu, Xiao
    Xue, Yu
    KNOWLEDGE-BASED SYSTEMS, 2017, 133 : 294 - 313
  • [8] Manifold Density Peaks Clustering Algorithm
    Xu, Xiaohua
    Ju, Yongsheng
    Liang, Yali
    He, Ping
    2015 THIRD INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, 2015, : 311 - 318
  • [9] Survey on Density Peaks Clustering Algorithm
    Xu X.
    Ding S.-F.
    Ding L.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (05): : 1800 - 1816
  • [10] Clustering Mixed Data Based on Density Peaks and Stacked Denoising Autoencoders
    Duan, Baobin
    Han, Lixin
    Gou, Zhinan
    Yang, Yi
    Chen, Shuangshuang
    SYMMETRY-BASEL, 2019, 11 (02):