Kernel Penalized K-means: A feature selection method based on Kernel K-means

被引:29
作者
Maldonado, Sebastian [1 ]
Carrizosa, Emilio [2 ]
Weber, Richard [3 ]
机构
[1] Univ Los Andes, Santiago, Chile
[2] Univ Seville, Fac Matemat, Seville, Spain
[3] Univ Chile, Dept Ingn Ind, FCFM, Santiago, Chile
关键词
Feature selection; Kernel K-means; Clustering; MICROARRAY DATA; CLASSIFICATION;
D O I
10.1016/j.ins.2015.06.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present an unsupervised method that selects the most relevant features using an embedded strategy while maintaining the cluster structure found with the initial feature set. It is based on the idea of simultaneously minimizing the violation of the initial cluster structure and penalizing the use of features via scaling factors. As the base method we use Kernel K-means which works similarly to K-means, one of the most popular clustering algorithms, but it provides more flexibility due to the use of kernel functions for distance calculation, thus allowing the detection of more complex cluster structures. We present an algorithm to solve the respective minimization problem iteratively, and perform experiments with several data sets demonstrating the superior performance of the proposed method compared to alternative approaches. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:150 / 160
页数:11
相关论文
共 34 条
[1]  
Alelyani S, 2014, CH CRC DATA MIN KNOW, P29
[2]  
[Anonymous], P 5 BERK S MATH STAT
[3]  
[Anonymous], 1978, PATTERN RECOGNITION
[4]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[5]  
Asuncion Arthur, 2007, UCI machine learning repository
[6]   Unsupervised clustering and feature weighting based on Generalized Dirichlet mixture modeling [J].
Ben Ismail, Mohamed Maher ;
Frigui, Hichem .
INFORMATION SCIENCES, 2014, 274 :35-54
[7]  
Bhattacharjee A., P NAT AC SCI US, P13790
[8]  
Bradley P., MACH LEARN P 15 INT, P82
[9]  
Cai D., 2010, KDD
[10]   Detecting relevant variables and interactions in supervised classification [J].
Carrizosa, Emilio ;
Martin-Barragan, Belen ;
Morales, Dolores Romero .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2011, 213 (01) :260-269