A novel DBSCAN with entropy and probability for mixed data

被引:17
作者
Liu, Xingxing [1 ]
Yang, Qing [2 ]
He, Ling [1 ]
机构
[1] Wuhan Univ Technol, Wuhan, Hunan, Peoples R China
[2] Wuhan Univ Technol, Discipline Econ & Management Technol, Wuhan, Hunan, Peoples R China
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2017年 / 20卷 / 02期
基金
中国博士后科学基金; 中国国家自然科学基金; 中国国家社会科学基金;
关键词
Distance measure; Density-based clustering; Conversion; Entropy; Mixed data; ALGORITHM; DISTANCE;
D O I
10.1007/s10586-017-0818-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In big data situation, to detect clusters of different size and shape is a challenging and imperative task. Density based clustering approaches have been widely used in many areas of science due to its simplicity and the ability to detect clusters of different sizes and shapes over the last several years. With diverse conversion on categorical data, a modified version of the DBSCAN algorithm is proposed to cluster mixed data, noted as density based clustering algorithm for mixed data with integration of entropy and probability distribution (EPDCA). Optional and various conversions are provided for clustering process with adaptability. Some benchmark data sets from UCI have been selected for testing the capability and validity of EPDCA. It was shown that the clustering results of EPDCA are considerably improved, especially on automatically number of clusters formed, noise discovery and time elapsed to form clusters.
引用
收藏
页码:1313 / 1323
页数:11
相关论文
共 33 条
  • [1] A k-mean clustering algorithm for mixed numeric and categorical data
    Ahmad, Amir
    Dey, Lipika
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) : 503 - 527
  • [2] A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set
    Ahmad, Amir
    Dey, Lipika
    [J]. PATTERN RECOGNITION LETTERS, 2007, 28 (01) : 110 - 118
  • [3] [Anonymous], 2015, INT J APPL ENG RES
  • [4] A dissimilarity measure for the k-Modes clustering algorithm
    Cao, Fuyuan
    Liang, Jiye
    Li, Deyu
    Bai, Liang
    Dang, Chuangyin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 26 : 120 - 127
  • [5] A Mathematical Theory for Clustering in Metric Spaces
    Chang, Cheng-Shang
    Liao, Wanjiun
    Chen, Yu-Sheng
    Liou, Li-Heng
    [J]. IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2016, 3 (01): : 2 - 16
  • [6] Dutta D., 2014, Int. J. Hybrid Intell. Syst, V11, P41, DOI [10.3233/HIS-130182, DOI 10.3233/HIS-130182]
  • [7] Rock: A robust clustering algorithm for categorical attributes
    Guha, S
    Rastogi, R
    Shim, K
    [J]. INFORMATION SYSTEMS, 2000, 25 (05) : 345 - 366
  • [8] A General Approach to Clustering in Large Databases with Noise
    Alexander Hinneburg
    Daniel A. Keim
    [J]. Knowledge and Information Systems, 2003, 5 (4) : 387 - 415
  • [9] Incremental clustering of mixed data based on distance hierarchy
    Hsu, Chung-Chian
    Huang, Yan-Ping
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) : 1177 - 1185
  • [10] Mining of mixed data with application to catalog marketing
    Hsu, Chung-Chian
    Chen, Yu-g Chen
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2007, 32 (01) : 12 - 23