A systematic evaluation of filter Unsupervised Feature Selection methods

被引:10
作者
Solorio-Fernandez, Saul [1 ]
Carrasco-Ochoa, J. Ariel [1 ]
Martinez-Trinidad, Jose Fco [1 ]
机构
[1] Inst Nacl Astrofis Opt & Electr, Comp Sci Dept, Luis Enrique Erro 1, Puebla 72840, Mexico
关键词
Dimensionality reduction; Unsupervised Feature Selection; Filter approach; High dimensional data; DIMENSIONALITY REDUCTION; VARIABLE SELECTION;
D O I
10.1016/j.eswa.2020.113745
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised Feature Selection (UFS) has aroused great interest in the last years because of its practical significance and application on a large variety of problems in expert and intelligent systems where unlabeled data appear. Specifically, Unsupervised Feature Selection methods based on the filter approach have received more attention due to their efficiency, scalability, and simplicity. However, in the literature, there are no comprehensive studies for assessing such UFS methods when they are applied, under the same conditions, to a wide variety of real-world data. To fill this gap, in this paper, we present a comprehensive empirical and systematic evaluation of the most popular and recent filter UFS methods, evaluating their performance in terms of clustering, classification, and runtime. The filter methods used in our study were applied on 50 datasets from the UCI Machine Learning Repository and 25 high dimensional datasets from the ASU Feature Selection Repository. To evaluate if the outcomes obtained by the assessed methods are statistically significant, the Friedman test and Holm post hoc procedure were applied in the clustering and classification results. From our experiments, we provide some practical guidelines and insights for the use of the filter UFS methods analyzed in our study. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:26
相关论文
共 83 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]   Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection [J].
Ang, Jun Chin ;
Mirzal, Andri ;
Haron, Habibollah ;
Hamed, Haza Nuzly Abdull .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) :971-989
[3]  
[Anonymous], 2010, P 16 ACM SIGKDD INT, DOI [10.1145/1835804.1835848, DOI 10.1145/1835804.1835848]
[4]  
[Anonymous], 2013, Data Clustering: Algorithms and Applications
[5]  
[Anonymous], 2013, P 23 INT JOINT C, DOI DOI 10.5555/2540128.2540361
[6]   Convex multi-task feature learning [J].
Argyriou, Andreas ;
Evgeniou, Theodoros ;
Pontil, Massimiliano .
MACHINE LEARNING, 2008, 73 (03) :243-272
[7]  
Beni G., 1993, P NATO ADV WORKSH RO, P703, DOI DOI 10.1007/978-3-642-58069-738
[8]   A unifying criterion for unsupervised clustering and feature selection [J].
Breaban, Mihaela ;
Luchian, Henri .
PATTERN RECOGNITION, 2011, 44 (04) :854-865
[9]  
Buhmann M., 2003, C MO AP C M, V12, DOI 10.1017/CBO9780511543241
[10]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28