A review of microarray datasets and applied feature selection methods

被引:452
作者
Bolon-Canedo, V. [1 ]
Sanchez-Marono, N. [1 ]
Alonso-Betanzos, A. [1 ]
Benitez, J. M. [2 ]
Herrera, F. [2 ,3 ]
机构
[1] Univ A Coruna, Dept Comp Sci, La Coruna 15071, Spain
[2] Univ Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain
[3] King Abdulaziz Univ, Fac Comp & Informat Technol North Jeddah, Jeddah 21589, Saudi Arabia
关键词
Feature selection; Microarray data; Unbalanced data; Dataset shift; GENE-EXPRESSION; IMBALANCED DATA; FEATURE SUBSET; MOLECULAR CLASSIFICATION; SURVIVAL PREDICTION; COMPLEXITY-MEASURES; MINIMUM REDUNDANCY; CANCER; ADENOCARCINOMA; FILTER;
D O I
10.1016/j.ins.2014.05.042
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Microarray data classification is a difficult challenge for machine learning researchers due to its high number of features and the small sample sizes. Feature selection has been soon considered a de facto standard in this field since its introduction, and a huge number of feature selection methods were utilized trying to reduce the input dimensionality while improving the classification performance. This paper is devoted to reviewing the most up-to-date feature selection methods developed in this field and the microarray databases most frequently used in the literature. We also make the interested reader aware of the problematic of data characteristics in this domain, such as the imbalance of the data, their complexity, or the so-called dataset shift. Finally, an experimental evaluation on the most representative datasets using well-known feature selection methods is presented, bearing in mind that the aim is not to provide the best feature selection method, but to facilitate their comparative study by the research community. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:111 / 135
页数:25
相关论文
共 119 条
[1]   Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J].
Abeel, Thomas ;
Helleputte, Thibault ;
Van de Peer, Yves ;
Dupont, Pierre ;
Saeys, Yvan .
BIOINFORMATICS, 2010, 26 (03) :392-398
[2]   KEEL: a software tool to assess evolutionary algorithms for data mining problems [J].
Alcala-Fdez, J. ;
Sanchez, L. ;
Garcia, S. ;
del Jesus, M. J. ;
Ventura, S. ;
Garrell, J. M. ;
Otero, J. ;
Romero, C. ;
Bacardit, J. ;
Rivas, V. M. ;
Fernandez, J. C. ;
Herrera, F. .
SOFT COMPUTING, 2009, 13 (03) :307-318
[3]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[4]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[5]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[6]  
Anaissi A., 2011, 2011 12th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel & Distributed Computing, P73, DOI 10.1109/SNPD.2011.12
[7]  
[Anonymous], 2006, FEATURE EXTRACTION F
[8]  
[Anonymous], 2007, STUDIES CLASSIFICATI
[9]  
[Anonymous], 2012, IEEE T SYST MAN CY C, DOI DOI 10.1109/TSMCC.2011.2161285
[10]  
[Anonymous], GEMS GENE EXPRESSION