Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection

被引:4
作者
Chen, Hui [1 ,2 ]
Xu, Kunpeng [3 ]
Chen, Lifei [4 ]
Jiang, Qingshan [1 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
[2] Univ Chinese Acad Sci, Shenzhen Coll Adv Technol, Shenzhen 518055, Peoples R China
[3] Univ Sherbrooke, Dept Comp Sci, Sherbrooke, PQ J1K 2R1, Canada
[4] Fujian Normal Univ, Coll Comp & Cyber Secur, Fuzhou 350007, Peoples R China
基金
中国国家自然科学基金;
关键词
machine learning; categorical data; similarity; feature selection; kernel density estimation; non-linear optimization; kernel clustering; K-MODES ALGORITHM; IMPACT;
D O I
10.3390/math9141680
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Kernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this study, we propose a self-expressive kernel subspace clustering algorithm for categorical data (SKSCC) using the self-expressive kernel density estimation (SKDE) scheme, as well as a new feature-weighted non-linear similarity measurement. In the SKSCC algorithm, we propose an effective non-linear optimization method to solve the clustering algorithm's objective function, which not only considers the relationship between attributes in a non-linear space but also assigns a weight to each attribute in the algorithm to measure the degree of correlation. A series of experiments on some widely used synthetic and real-world datasets demonstrated the better effectiveness and efficiency of the proposed algorithm compared with other state-of-the-art methods, in terms of non-linear relationship exploration among attributes.
引用
收藏
页数:22
相关论文
共 49 条
[1]  
Alelyani S, 2014, CH CRC DATA MIN KNOW, P29
[2]   Information-theoretic software clustering [J].
Andritsos, P ;
Tzerpos, V .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2005, 31 (02) :150-165
[3]  
Andritsos P, 2004, LECT NOTES COMPUT SC, V2992, P123
[4]  
[Anonymous], 2014, C4 5 PROGRAMS MACHIN
[5]   The Impact of Cluster Representatives on the Convergence of the K-Modes Type Clustering [J].
Bai, Liang ;
Liang, Jiye ;
Dang, Chuangyin ;
Cao, Fuyuan .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (06) :1509-1522
[6]   A novel attribute weighting algorithm for clustering high-dimensional categorical data [J].
Bai, Liang ;
Liang, Jiye ;
Dang, Chuangyin ;
Cao, Fuyuan .
PATTERN RECOGNITION, 2011, 44 (12) :2843-2861
[7]   A Survey on Filter Techniques for Feature Selection in Text Mining [J].
Bharti, Kusum Kumari ;
Singh, Pramod Kumar .
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2012), 2014, 236 :1545-1559
[8]  
Breiman L., 1984, Classification and Regression Trees, DOI DOI 10.1201/9781315139470
[9]   A weighting k-modes algorithm for subspace clustering of categorical data [J].
Cao, Fuyuan ;
Liang, Jiye ;
Li, Deyu ;
Zhao, Xingwang .
NEUROCOMPUTING, 2013, 108 :23-30
[10]   An optimization algorithm for clustering using weighted dissimilarity measures [J].
Chan, EY ;
Ching, WK ;
Ng, MK ;
Huang, JZ .
PATTERN RECOGNITION, 2004, 37 (05) :943-952