共 73 条
The Impact of Multifunctional Genes on "Guilt by Association" Analysis
被引:136
作者:
Gillis, Jesse
[1
,2
]
Pavlidis, Paul
[1
,2
]
机构:
[1] Univ British Columbia, Dept Psychiat, Ctr High Throughput Biol, Vancouver, BC, Canada
[2] Univ British Columbia, Michael Smith Labs, Vancouver, BC V5Z 1M9, Canada
来源:
关键词:
PROTEIN-PROTEIN INTERACTIONS;
SYSTEMATIC METAANALYSES;
COEXPRESSION NETWORKS;
HIGH-THROUGHPUT;
BY-ASSOCIATION;
SCALE;
MODULARITY;
EVOLUTION;
COST;
RESOURCE;
D O I:
10.1371/journal.pone.0017258
中图分类号:
O [数理科学和化学];
P [天文学、地球科学];
Q [生物科学];
N [自然科学总论];
学科分类号:
07 ;
0710 ;
09 ;
摘要:
Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e. g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.
引用
收藏
页数:16
相关论文