An Ensemble Framework Coping with Instability in the Gene Selection Process

被引:0
作者
José A. Castellanos-Garzón
Juan Ramos
Daniel López-Sánchez
Juan F. de Paz
Juan M. Corchado
机构
[1] University of Salamanca,IBSAL/BISITE Research Group
[2] University of Coimbra,CISUC, ECOS Research Group
[3] Osaka Institute of Technology,undefined
来源
Interdisciplinary Sciences: Computational Life Sciences | 2018年 / 10卷
关键词
Gene selection; Filter method; Ensemble method; Wrapper method; Machine learning; Data mining; Gene expression data;
D O I
暂无
中图分类号
学科分类号
摘要
This paper proposes an ensemble framework for gene selection, which is aimed at addressing instability problems presented in the gene filtering task. The complex process of gene selection from gene expression data faces different instability problems from the informative gene subsets found by different filter methods. This makes the identification of significant genes by the experts difficult. The instability of results can come from filter methods, gene classifier methods, different datasets of the same disease and multiple valid groups of biomarkers. Even though there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This work proposes a framework involving five stages of gene filtering to discover biomarkers for diagnosis and classification tasks. This framework performs a process of stable feature selection, facing the problems above and, thus, providing a more suitable and reliable solution for clinical and research purposes. Our proposal involves a process of multistage gene filtering, in which several ensemble strategies for gene selection were added in such a way that different classifiers simultaneously assess gene subsets to face instability. Firstly, we apply an ensemble of recent gene selection methods to obtain diversity in the genes found (stability according to filter methods). Next, we apply an ensemble of known classifiers to filter genes relevant to all classifiers at a time (stability according to classification methods). The achieved results were evaluated in two different datasets of the same disease (pancreatic ductal adenocarcinoma), in search of stability according to the disease, for which promising results were achieved.
引用
收藏
页码:12 / 23
页数:11
相关论文
共 156 条
[1]  
Jiang D(2004)Cluster analysis for gene expression data: a survey IEEE Trans Knowl Data Eng 16 1370-1386
[2]  
Tang C(2012)A survey on filter techniques for feature selection in gene expression microarray analysis IEEE/ACM Trans Comput Biol Bioinform 9 1106-1118
[3]  
Zhang A(2004)Filter versus wrapper gene selection approaches in DNA microarray domains Artif Intell Med 31 91-103
[4]  
Lazar C(2011)Filter versus wrapper feature subset selection in large dimensionality microarray: a review Int J Comput Sci Inf Technol (IJCSIT) 2 1048-1053
[5]  
Taminau J(2009)Robust biomarker identification for cancer diagnosis with ensemble feature selection methods Bioinformatics 26 392-398
[6]  
Meganck S(2010)Stable feature selection for biomarker discovery Comput Biol Chem 34 215-225
[7]  
Steenhoff D(2016)A survey on evolutionary computation approaches to feature selection IEEE Trans Evol Comput 20 606-626
[8]  
Coletta A(2016)A review of ensemble methods in bioinformatics: including stability of feature selection and ensemble feature selection methods Bioinformatics 4 296-308
[9]  
Molter C(2010)A forecasting solution to the oil spill problem based on a hybrid intelligent system Inf Sci 180 2029-2043
[10]  
deSchaetzen V(2003)An introduction to variable and feature selection J Mach Learn Res 3 1157-1182