Feature Selection for Microarray Gene Expression Data Using Simulated Annealing Guided by the Multivariate Joint Entropy

被引:14
作者
Fernando Gonzalez-Navarro, Felix [1 ]
Belanche-Munoz, Lluis A. [2 ]
机构
[1] Univ Autonoma Baja California, Inst Ingn, Mexicali, Baja California, Mexico
[2] Univ Politecn Cataluna, Dept Llenguatges & Sistemes Informat, Barcelona, Spain
来源
COMPUTACION Y SISTEMAS | 2014年 / 18卷 / 02期
关键词
Feature selection; microarray gene expression data; multivariate joint entropy; simulated annealing;
D O I
10.13053/CyS-18-2-2014-032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Microarray classification poses many challenges for data analysis, given that a gene expression data set may consist of dozens of observations with thousands or even tens of thousands of genes. In this context, feature subset selection techniques can be very useful to reduce the representation space to one that is manageable by classification techniques. In this work we use the discretized multivariate joint entropy as the basis for a fast evaluation of gene relevance in a Microarray Gene Expression context. The proposed algorithm combines a simulated annealing schedule specially designed for feature subset selection with the incrementally computed joint entropy, reusing previous values to compute current feature subset relevance. This combination turns out to be a powerful tool when applied to the maximization of gene subset relevance. Our method delivers highly interpretable solutions that are more accurate than competing methods. The algorithm is fast, effective and has no critical parameters. The experimental results in several public-domain microarray data sets show a notoriously high classification performance and low size subsets, formed mostly by biologically meaningful genes. The technique is general and could be used in other similar scenarios.
引用
收藏
页码:275 / 293
页数:19
相关论文
共 54 条
[1]   Phosphorylation of the ErbB3 binding protein Ebp1 by p21-activated kinase 1 in breast cancer cells [J].
Akinmade, D. ;
Talukder, A. H. ;
Zhang, Y. ;
Luo, W-m ;
Kumar, R. ;
Hamburger, A. W. .
BRITISH JOURNAL OF CANCER, 2008, 98 (06) :1132-1140
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   Etk/Bmx tyrosine kinase activates Pak1 and regulates tumorigenicity of breast cancer cells [J].
Bagheri-Yarmand, R ;
Mandal, M ;
Taludker, AH ;
Wang, RA ;
Vadlamudi, RK ;
Kung, HJ ;
Kumar, R .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2001, 276 (31) :29403-29409
[4]   A formalism for relevance and its application in feature subset selection [J].
Bell, DA ;
Wang, H .
MACHINE LEARNING, 2000, 41 (02) :175-195
[5]   Target selectivity in EF-hand calcium binding proteins [J].
Bhattacharya, S ;
Bunick, CG ;
Chazin, WJ .
BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH, 2004, 1742 (1-3) :69-79
[6]   Immunohistochemical localization of collagen type X1 α1 and α2 chains in human colon tissue [J].
Bowen, Kara B. ;
Reimers, Aaron P. ;
Luman, Sarah ;
Kronz, Joseph D. ;
Fyffe, William E. ;
Oxford, Julia Thom .
JOURNAL OF HISTOCHEMISTRY & CYTOCHEMISTRY, 2008, 56 (03) :275-283
[7]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[8]  
Bu HL, 2007, LECT NOTES OPER RES, V7, P232
[9]   An efficient gene selection algorithm based on mutual information [J].
Cai, Ruichu ;
Hao, Zhifeng ;
Yang, Xiaowei ;
Wen, Wen .
NEUROCOMPUTING, 2009, 72 (4-6) :991-999
[10]  
CATLETT J, 1991, LECT NOTES ARTIF INT, V482, P164, DOI 10.1007/BFb0017012