The Impact of Multifunctional Genes on "Guilt by Association" Analysis

被引:136
作者
Gillis, Jesse [1 ,2 ]
Pavlidis, Paul [1 ,2 ]
机构
[1] Univ British Columbia, Dept Psychiat, Ctr High Throughput Biol, Vancouver, BC, Canada
[2] Univ British Columbia, Michael Smith Labs, Vancouver, BC V5Z 1M9, Canada
关键词
PROTEIN-PROTEIN INTERACTIONS; SYSTEMATIC METAANALYSES; COEXPRESSION NETWORKS; HIGH-THROUGHPUT; BY-ASSOCIATION; SCALE; MODULARITY; EVOLUTION; COST; RESOURCE;
D O I
10.1371/journal.pone.0017258
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e. g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.
引用
收藏
页数:16
相关论文
共 73 条
[1]   Extreme self-organization in networks constructed from gene expression data [J].
Agrawal, H .
PHYSICAL REVIEW LETTERS, 2002, 89 (26)
[2]   Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database [J].
Allen, Nicole C. ;
Bagade, Sachin ;
McQueen, Matthew B. ;
Ioannidis, John P. A. ;
Kavvoura, Fotini K. ;
Khoury, Muin J. ;
Tanzi, Rudolph E. ;
Bertram, Lars .
NATURE GENETICS, 2008, 40 (07) :827-834
[3]   McKusick's Online Mendelian Inheritance in Man (OMIM®) [J].
Amberger, Joanna ;
Bocchini, Carol A. ;
Scott, Alan F. ;
Hamosh, Ada .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D793-D796
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   Gaining confidence in high-throughput protein interaction networks [J].
Bader, JS ;
Chaudhuri, A ;
Rothberg, JM ;
Chant, J .
NATURE BIOTECHNOLOGY, 2004, 22 (01) :78-85
[6]   AutDB: a gene reference resource for autism research [J].
Basu, Saumyendra N. ;
Kollu, Ravi ;
Banerjee-Basu, Sharmila .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D832-D836
[7]   Evolutionary and physiological importance of hub proteins [J].
Batada, Nizar N. ;
Hurst, Laurence D. ;
Tyers, Mike .
PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (07) :748-756
[8]   Choosing negative examples for the prediction of protein-protein interactions [J].
Ben-Hur, A ;
Noble, WS .
BMC BIOINFORMATICS, 2006, 7 (Suppl 1)
[9]   Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database [J].
Bertram, Lars ;
McQueen, Matthew B. ;
Mullin, Kristina ;
Blacker, Deborah ;
Tanzi, Rudolph E. .
NATURE GENETICS, 2007, 39 (01) :17-23
[10]   The BioGRID interaction database:: 2008 update [J].
Breitkreutz, Bobby-Joe ;
Stark, Chris ;
Reguly, Teresa ;
Boucher, Lorrie ;
Breitkreutz, Ashton ;
Livstone, Michael ;
Oughtred, Rose ;
Lackner, Daniel H. ;
Bahler, Jurg ;
Wood, Valerie ;
Dolinski, Kara ;
Tyers, Mike .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D637-D640