Feature construction from synergic pairs to improve microarray-based classification

被引:17
作者
Hanczar, Blaise [1 ]
Zucker, Jean-Daniel
Henegar, Corneliu
Saitta, Lorenza
机构
[1] Univ Paris 13, Lab Informat Med & Bioinformat, F-93017 Bobigny, France
[2] Univ Paris Descarte, F-75006 Paris, France
[3] Univ Paris 06, Ctr Rech Cordeliers, UMR S 872, F-75006 Paris, France
[4] INSERM, U872, F-75006 Paris, France
[5] Univ Piemonte Orientale, Dipartimento Informat, I-15100 Alessandria, Italy
关键词
D O I
10.1093/bioinformatics/btm429
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Microarray experiments that allow simultaneous expression profiling of thousands of genes in various conditions (tissues, cells or time) generate data whose analysis raises difficult problems. In particular, there is a vast disproportion between the number of attributes (tens of thousands) and the number of examples (several tens). Dimension reduction is therefore a key step before applying classification approaches. Many methods have been proposed to this purpose, but only a few of them considered a direct quantification of transcriptional interactions. We describe and experimentally validate a new dimension reduction and feature construction method, which assesses interactions between expression profiles to improve microarray-based classification accuracy. Results: Our approach relies on a mutual information measure that exposes some elementary constituents of the information contained in a pair of gene expression profiles. We show that their analysis implies a term that represents the information of the interaction between the two genes. The principle of our method, called FeatKNN, is to exploit the information provided by highly synergic gene pairs to improve classification accuracy. First, a heuristic search selects the most informative gene pairs. Then, for each selected pair, a new feature, representing the classification margin of a KNN classifier in the gene pairs space, is constructed. We show experimentally that the interactional information has a degree of significance comparable to that of the gene expression profiles considered separately. Our method has been tested with different classifiers and yielded significant improvements in accuracy on several public microarray databases. Moreover, a synthetic assessment of the biological significance of the concept of synergic gene pairs suggested its ability to uncover relevant mechanisms underlying interactions among various cellular processes.
引用
收藏
页码:2866 / 2872
页数:7
相关论文
共 24 条
[1]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[2]  
BENDOR A, 2000, AGL200013 HEBR U
[3]  
Bo TH, 2002, GENOME BIOL, V3
[4]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[5]  
Butte A J, 2000, Pac Symp Biocomput, P418
[6]  
Dai JJ, 2006, STAT APPL GENET MOL, V5
[7]   Cellular survival pathways and resistance to cancer therapy [J].
Dennis, PA ;
Kastan, MB .
DRUG RESISTANCE UPDATES, 1998, 1 (05) :301-309
[8]   Minimum redundancy feature selection from microarray gene expression data [J].
Ding, C ;
Peng, HC .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :523-528
[9]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87