Soft subspace clustering of categorical data with probabilistic distance

被引:39
|
作者
Chen, Lifei [1 ,2 ]
Wang, Shengrui [3 ]
Wang, Kaijun [1 ,2 ]
Zhu, Jianping [4 ,5 ]
机构
[1] Fujian Normal Univ, Sch Math & Comp Sci, Fuzhou 350117, Fujian, Peoples R China
[2] Fujian Normal Univ, Fujian Prov Key Lab Network Secur & Cryptol, Fuzhou 350117, Fujian, Peoples R China
[3] Univ Sherbrooke, Dept Comp Sci, Sherbrooke, PQ J1K 2R1, Canada
[4] Xiamen Univ, Sch Management, Xiamen 361005, Peoples R China
[5] Xiamen Univ, Data Mining Res Ctr, Xiamen 361005, Peoples R China
基金
加拿大自然科学与工程研究理事会; 中国国家自然科学基金;
关键词
Subspace clustering; Categorical data; Distance measure; Attribute weighting; Kernel density estimation; ALGORITHM;
D O I
10.1016/j.patcog.2015.09.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Categorical data clustering is an important subject in pattern recognition. Currently, subspace clustering of categorical data remains an open problem due to the difficulties in estimating attribute interestingness according to the statistics of categories in clusters. In this paper, a new algorithm is proposed for clustering categorical data with a novel soft feature-selection scheme, by which each categorical attribute is automatically assigned a weight that correlates with the smoothed dispersion of the categories in a cluster. In the proposed algorithm, dissimilarity between categorical data objects is measured using a probabilistic distance function, based on kernel density estimation for categorical attributes. We also make use of the probabilistic distances to define a cluster validity index for estimating the number of categorical clusters. The suitability of the proposal is demonstrated in an empirical study done with some widely used real-world data sets and synthetic data sets, and the results show its outstanding performance. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:322 / 332
页数:11
相关论文
共 50 条
  • [1] An entropy-based subspace clustering algorithm for categorical data
    Carbonera, Joel Luis
    Abel, Mara
    2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 272 - 277
  • [2] Kernel Subspace Clustering Algorithm for Categorical Data
    Xu K.-P.
    Chen L.-F.
    Sun H.-J.
    Wang B.-Z.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (11): : 3492 - 3505
  • [3] Parallel Hierarchical Subspace Clustering of Categorical Data
    Pang, Ning
    Zhang, Jifu
    Zhang, Chaowei
    Qin, Xiao
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (04) : 542 - 555
  • [4] Subspace Clustering with Feature Grouping for Categorical Data
    Jia, Hong
    Dong, Menghan
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, KSEM 2023, 2023, 14117 : 247 - 254
  • [5] A weighting k-modes algorithm for subspace clustering of categorical data
    Cao, Fuyuan
    Liang, Jiye
    Li, Deyu
    Zhao, Xingwang
    NEUROCOMPUTING, 2013, 108 : 23 - 30
  • [6] ICE: Incremental Subspace Clustering of High-Dimensional Categorical Data
    Pang, Ning
    Zhang, Chaowei
    Zhang, Jifu
    Qin, Xiao
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2025, 33 (01) : 87 - 118
  • [7] Clustering categorical data based on distance vectors
    Zhang, P
    Wang, XG
    Song, PXK
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 355 - 367
  • [8] From Context to Distance: Learning Dissimilarity for Categorical Data Clustering
    Ienco, Dino
    Pensa, Ruggero G.
    Meo, Rosa
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2012, 6 (01)
  • [9] Rough subspace-based clustering ensemble for categorical data
    Gao, Can
    Pedrycz, Witold
    Miao, Duoqian
    SOFT COMPUTING, 2013, 17 (09) : 1643 - 1658
  • [10] Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters
    Jia, Hong
    Cheung, Yiu-Ming
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) : 3308 - 3325