Kernel-based data fusion for gene prioritization

被引:90
作者
De Bie, Tijl
Tranchevent, Leon-Charles
Van Oeffelen, Liesbeth M. M.
Moreau, Yves
机构
[1] Univ Bristol, Dept Engn Math, Bristol BS8 1TR, Avon, England
[2] Katholieke Univ Leuven, OKP Res Grp, B-3000 Louvain, Belgium
[3] Katholieke Univ Leuven, ESAT SCD, B-3001 Louvain, Belgium
关键词
D O I
10.1093/bioinformatics/btm187
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Hunting disease genes is a problem of primary importance in biomedical research. Biologists usually approach this problem in two steps: first a set of candidate genes is identified using traditional positional cloning or high- throughput genomics techniques; second, these genes are further investigated and validated in the wet lab, one by one. To speed up discovery and limit the number of costly wet lab experiments, biologists must test the candidate genes starting with the most probable candidates. So far, biologists have relied on literature studies, extensive queries to multiple databases and hunches about expected properties of the disease gene to determine such an ordering. Recently, we have introduced the data mining tool ENDEAVOUR (Aerts et al., 2006), which performs this task automatically by relying on different genome-wide data sources, such as Gene Ontology, literature, microarray, sequence and more. Results: In this article, we present a novel kernel method that operates in the same setting: based on a number of different views on a set of training genes, a prioritization of test genes is obtained. We furthermore provide a thorough learning theoretical analysis of the method's guaranteed performance. Finally, we apply the method to the disease data sets on which ENDEAVOUR (Aerts et al., 2006) has been benchmarked, and report a considerable improvement in empirical performance.
引用
收藏
页码:I125 / I132
页数:8
相关论文
共 13 条
[1]   Gene prioritization through genomic data fusion [J].
Aerts, S ;
Lambrechts, D ;
Maity, S ;
Van Loo, P ;
Coessens, B ;
De Smet, F ;
Tranchevent, LC ;
De Moor, B ;
Marynen, P ;
Hassan, B ;
Carmeliet, P ;
Moreau, Y .
NATURE BIOTECHNOLOGY, 2006, 24 (05) :537-544
[2]  
[Anonymous], 2004, INT C MACH LEARN
[3]  
[Anonymous], 2004, KERNEL METHODS PATTE
[4]  
HERRMANN D, 2003, ADV NEURAL INFORM PR, P415
[5]   Ensembl 2005 [J].
Hubbard, T ;
Andrews, D ;
Caccamo, M ;
Cameron, G ;
Chen, Y ;
Clamp, M ;
Clarke, L ;
Coates, G ;
Cox, T ;
Cunningham, F ;
Curwen, V ;
Cutts, T ;
Down, T ;
Durbin, R ;
Fernandez-Suarez, XM ;
Gilbert, J ;
Hammond, M ;
Herrero, J ;
Hotz, H ;
Howe, K ;
Iyer, V ;
Jekosch, K ;
Kahari, A ;
Kasprzyk, A ;
Keefe, D ;
Keenan, S ;
Kokocinsci, F ;
London, D ;
Longden, I ;
McVicker, G ;
Melsopp, C ;
Meidl, P ;
Potter, S ;
Proctor, G ;
Rae, M ;
Rios, D ;
Schuster, M ;
Searle, S ;
Severin, J ;
Slater, G ;
Smedley, D ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Storey, R ;
Trevanion, S ;
Ureta-Vidal, A ;
Vogel, J ;
White, S .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D447-D453
[6]  
Lanckriet GRG, 2003, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004, P300
[7]  
Lanckriet GRG, 2004, J MACH LEARN RES, V5, P27
[8]   A statistical framework for genomic data fusion [J].
Lanckriet, GRG ;
De Bie, T ;
Cristianini, N ;
Jordan, MI ;
Noble, WS .
BIOINFORMATICS, 2004, 20 (16) :2626-2635
[9]   Fast kernels for inexact string matching [J].
Leslie, C ;
Kuang, R .
LEARNING THEORY AND KERNEL MACHINES, 2003, 2777 :114-128
[10]  
Ong CS, 2005, J MACH LEARN RES, V6, P1043