Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection

被引：4

作者：

Chen, Hui ^{[1
,2
]}

Xu, Kunpeng ^{[3
]}

Chen, Lifei ^{[4
]}

Jiang, Qingshan ^{[1
]}

机构：

[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

[2] Univ Chinese Acad Sci, Shenzhen Coll Adv Technol, Shenzhen 518055, Peoples R China

[3] Univ Sherbrooke, Dept Comp Sci, Sherbrooke, PQ J1K 2R1, Canada

[4] Fujian Normal Univ, Coll Comp & Cyber Secur, Fuzhou 350007, Peoples R China

来源：

MATHEMATICS | 2021年 / 9卷 / 14期

基金：

中国国家自然科学基金;

关键词：

machine learning; categorical data; similarity; feature selection; kernel density estimation; non-linear optimization; kernel clustering; K-MODES ALGORITHM; IMPACT;

D O I：

10.3390/math9141680

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Kernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this study, we propose a self-expressive kernel subspace clustering algorithm for categorical data (SKSCC) using the self-expressive kernel density estimation (SKDE) scheme, as well as a new feature-weighted non-linear similarity measurement. In the SKSCC algorithm, we propose an effective non-linear optimization method to solve the clustering algorithm's objective function, which not only considers the relationship between attributes in a non-linear space but also assigns a weight to each attribute in the algorithm to measure the degree of correlation. A series of experiments on some widely used synthetic and real-world datasets demonstrated the better effectiveness and efficiency of the proposed algorithm compared with other state-of-the-art methods, in terms of non-linear relationship exploration among attributes.

引用

页数：22

共 49 条

[1]

Alelyani S, 2014, CH CRC DATA MIN KNOW, P29

[2] Information-theoretic software clustering [J].

Andritsos, P ;

Tzerpos, V .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2005, 31 (02) :150-165

[3]

Andritsos P, 2004, LECT NOTES COMPUT SC, V2992, P123

[4]

[Anonymous], 2014, C4 5 PROGRAMS MACHIN

[5] The Impact of Cluster Representatives on the Convergence of the K-Modes Type Clustering [J].

Bai, Liang ;

Liang, Jiye ;

Dang, Chuangyin ;

Cao, Fuyuan .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (06) :1509-1522

[6] A novel attribute weighting algorithm for clustering high-dimensional categorical data [J].

Bai, Liang ;

Liang, Jiye ;

Dang, Chuangyin ;

Cao, Fuyuan .

PATTERN RECOGNITION, 2011, 44 (12) :2843-2861

[7] A Survey on Filter Techniques for Feature Selection in Text Mining [J].

Bharti, Kusum Kumari ;

Singh, Pramod Kumar .

PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2012), 2014, 236 :1545-1559

[8]

Breiman L., 1984, Classification and Regression Trees, DOI DOI 10.1201/9781315139470

[9] A weighting k-modes algorithm for subspace clustering of categorical data [J].

Cao, Fuyuan ;

Liang, Jiye ;

Li, Deyu ;

Zhao, Xingwang .

NEUROCOMPUTING, 2013, 108 :23-30

[10] An optimization algorithm for clustering using weighted dissimilarity measures [J].

Chan, EY ;

Ching, WK ;

Ng, MK ;

Huang, JZ .

PATTERN RECOGNITION, 2004, 37 (05) :943-952

← 1 2 3 4 5 →