The Impact of Multifunctional Genes on "Guilt by Association" Analysis

被引:136
作者
Gillis, Jesse [1 ,2 ]
Pavlidis, Paul [1 ,2 ]
机构
[1] Univ British Columbia, Dept Psychiat, Ctr High Throughput Biol, Vancouver, BC, Canada
[2] Univ British Columbia, Michael Smith Labs, Vancouver, BC V5Z 1M9, Canada
关键词
PROTEIN-PROTEIN INTERACTIONS; SYSTEMATIC METAANALYSES; COEXPRESSION NETWORKS; HIGH-THROUGHPUT; BY-ASSOCIATION; SCALE; MODULARITY; EVOLUTION; COST; RESOURCE;
D O I
10.1371/journal.pone.0017258
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e. g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.
引用
收藏
页数:16
相关论文
共 73 条
[61]   Multifunctional genes [J].
van de Peppel, Jeroen ;
Holstege, Frank C. P. .
MOLECULAR SYSTEMS BIOLOGY, 2005, 1 (1)
[62]   Global protein function prediction from protein-protein interaction networks [J].
Vazquez, A ;
Flammini, A ;
Maritan, A ;
Vespignani, A .
NATURE BIOTECHNOLOGY, 2003, 21 (06) :697-700
[63]   Comparative assessment of large-scale data sets of protein-protein interactions [J].
von Mering, C ;
Krause, R ;
Snel, B ;
Cornell, M ;
Oliver, SG ;
Fields, S ;
Bork, P .
NATURE, 2002, 417 (6887) :399-403
[64]   Pleiotropic scaling of gene effects and the 'cost of complexity' [J].
Wagner, Gunter P. ;
Kenney-Hunt, Jane P. ;
Pavlicev, Mihaela ;
Peck, Joel R. ;
Waxman, David ;
Cheverud, James M. .
NATURE, 2008, 452 (7186) :470-U9
[65]   Modularity and the cost of complexity [J].
Welch, JJ ;
Waxman, D .
EVOLUTION, 2003, 57 (08) :1723-1734
[66]   Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks [J].
Wolfe, CJ ;
Kohane, IS ;
Butte, AJ .
BMC BIOINFORMATICS, 2005, 6 (1)
[67]   DIP, the Database of Interacting Proteins:: a research tool for studying cellular networks of protein interactions [J].
Xenarios, I ;
Salwínski, L ;
Duan, XQJ ;
Higney, P ;
Kim, SM ;
Eisenberg, D .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :303-305
[68]   Evolution of biomolecular networks - lessons from metabolic and protein interactions [J].
Yamada, Takuji ;
Bork, Peer .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2009, 10 (11) :791-803
[69]   On the shoulders of giants: p63, p73 and the rise of p53 [J].
Yang, A ;
Kaghad, M ;
Caput, D ;
McKeon, F .
TRENDS IN GENETICS, 2002, 18 (02) :90-95
[70]   A direct interaction between the survival motor neuron protein and p53 and its relationship to spinal muscular atrophy [J].
Young, PJ ;
Day, PM ;
Zhou, J ;
Androphy, EJ ;
Morris, GE ;
Lorson, CL .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2002, 277 (04) :2852-2859