Pairwise dependence-based unsupervised feature selection

被引:69
作者
Lim, Hyunki [1 ]
Kim, Dae-Won [2 ]
机构
[1] Kyonggi Univ, Div Comp Sci & Engn, 154-42 Gwanggyosan Ro, Suwon 16227, Gyeonggi Do, South Korea
[2] Chung Ang Univ, Sch Comp Sci & Engn, 84 Heukseok Ro, Seoul 06974, South Korea
基金
新加坡国家研究基金会;
关键词
Unsupervised feature selection; Feature dependency; Feature redundancy; Joint entropy; l(2; 1); regularization; SPARSE FEATURE-SELECTION; MUTUAL INFORMATION; GRAPH LAPLACIAN; REGRESSION;
D O I
10.1016/j.patcog.2020.107663
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many research topics present very high dimensional data. Because of the heavy execution times and large memory requirements, many machine learning methods have difficulty in processing these data. In this paper, we propose a new unsupervised feature selection method considering the pairwise dependence of features (feature dependency-based unsupervised feature selection, or DUFS). To avoid selecting redundant features, the proposed method calculates the dependence among features and applies this information to a regression-based unsupervised feature selection process. We can select small feature set with the dependence among features by eliminating redundant features. To consider the dependence among features, we used mutual information widely used in supervised feature selection area. To our best knowledge, it is the first study to consider the pairwise dependence of features in the unsupervised feature selection method. Experimental results for six data sets demonstrate that the proposed method outperforms existing state-of-the-art unsupervised feature selection methods in most cases. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 42 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
Cai D, 2010, P 16 ACM SIGKDD INT, P333
[3]  
Chen FH, 2007, PRINCIPLES OF TISSUE ENGINEERING, 3RD EDITION, P823, DOI 10.1016/B978-012370615-7/50059-7
[4]  
Ding C, 2005, SIAM PROC S, P606
[5]   Robust unsupervised feature selection via matrix factorization [J].
Du, Shiqiang ;
Ma, Yide ;
Li, Shouliang ;
Ma, Yurun .
NEUROCOMPUTING, 2017, 241 :115-127
[6]  
Dy JG, 2004, J MACH LEARN RES, V5, P845
[7]   Sparse Subspace Clustering: Algorithm, Theory, and Applications [J].
Elhamifar, Ehsan ;
Vidal, Rene .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) :2765-2781
[8]  
Fanty M., 1991, ADV NEURAL INFORM PR, P220
[9]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[10]  
Han D, 2015, PROC CVPR IEEE, P5016, DOI 10.1109/CVPR.2015.7299136