Learning gene functional classifications from multiple data types

被引:153
作者
Pavlidis, P
Weston, J
Cai, JS
Noble, WS
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[2] Columbia Univ, Columbia Genome Ctr, New York, NY 10027 USA
[3] Columbia Univ, Dept Med Informat, New York, NY 10027 USA
[4] BIOwulf Technol LLC, New York, NY 10007 USA
关键词
gene functional classification; phylogenetic profiles; microarray expression analysis;
D O I
10.1089/10665270252935539
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence comparisons. We demonstrate the application of the support vector machine (SVM) learning algorithm to this functional inference task. Our results suggest the importance of exploiting prior information about the heterogeneity of the data. In particular, we propose an SVM kernel function that is explicitly heterogeneous. In addition, we describe feature scaling methods for further exploiting prior knowledge of heterogeneity by giving each data type different weights.
引用
收藏
页码:401 / 411
页数:11
相关论文
共 28 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], P 5 ANN WORKSH COMP
[3]  
[Anonymous], 1999, NEURAL NETWORKS SIGN
[4]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[5]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[6]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[7]  
Burges CJC, 1999, ADVANCES IN KERNEL METHODS, P89
[8]  
CHAPELLE O, 2000, ADV NEURAL INFORMATI, V12
[9]   The transcriptional program of sporulation in budding yeast [J].
Chu, S ;
DeRisi, J ;
Eisen, M ;
Mulholland, J ;
Botstein, D ;
Brown, PO ;
Herskowitz, I .
SCIENCE, 1998, 282 (5389) :699-705
[10]  
Cristianini N, 2000, Intelligent Data Analysis: An Introduction