Exploring correlations in gene expression microarray data for maximum predictive-minimum redundancy biomarker selection and classification

被引:8
作者
Arevalillo, Jorge M. [1 ]
Navarro, Hilario [1 ]
机构
[1] Univ Nacl Educ Distancia, Dept Stat Operat Res & Numer Anal, E-28040 Madrid, Spain
关键词
Microarray data; Gene expression; Biomarker selection; Classification and prediction; Redundancy; Classification and regression tree; CARCINOMA-CELLS; CANCER; TUMOR; INTERLEUKIN-8; DISCOVERY; DIAGNOSIS;
D O I
10.1016/j.compbiomed.2013.07.005
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
An important issue in the analysis of gene expression microarray data is concerned with the extraction of valuable genetic interactions from high dimensional data sets containing gene expression levels collected for a small sample of assays. Past and ongoing research efforts have been focused on biomarker selection for phenotype classification. Usually, many genes convey useless information for classifying the outcome and should be removed from the analysis; on the other hand, some of them may be highly correlated, which reveals the presence of redundant expressed information. In this paper we propose a method for the selection of highly predictive genes having a low redundancy in their expression levels. The predictive accuracy of the selection is assessed by means of Classification and Regression Trees (CART) models which enable assessment of the performance of the selected genes for classifying the outcome variable and will also uncover complex genetic interactions. The method is illustrated throughout the paper using a public domain colon cancer gene expression data set. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1437 / 1443
页数:7
相关论文
共 37 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], METRIKA
[3]   A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis [J].
Arevalillo, Jorge M. ;
Navarro, Hilario .
BMC BIOINFORMATICS, 2011, 12 :S6
[4]   Tissue classification with gene expression profiles [J].
Ben-Dor, A ;
Bruhn, L ;
Friedman, N ;
Nachman, I ;
Schummer, M ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :559-583
[5]  
Boulesteix AL, 2008, CANCER INFORM, V6, P77
[6]  
Caban JJ, 2012, IEEE ENG MED BIO, P2700, DOI 10.1109/EMBC.2012.6346521
[7]   Searching for differentially expressed gene combinations [J].
Dettling, M ;
Gabrielson, E ;
Giovanni, P .
GENOME BIOLOGY, 2005, 6 (10)
[8]  
Ding Chris, 2005, Journal of Bioinformatics and Computational Biology, V3, P185, DOI 10.1142/S0219720005001004
[9]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[10]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)