A large-scale benchmark of gene prioritization methods

被引:36
作者
Guala, Dimitri [1 ]
Sonnhammer, Erik L. L. [1 ]
机构
[1] Stockholm Univ, Dept Biochem & Biophys, Stockholm Bioinformat Ctr, Sci Life Lab, Box 1031, S-17121 Solna, Sweden
关键词
RANDOM-WALK; DISEASE GENES; WEB TOOLS; NETWORK; IDENTIFICATION; PREDICTION;
D O I
10.1038/srep46598
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology (GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.
引用
收藏
页数:10
相关论文
共 41 条
[1]   Comparative interactomics with Funcoup 2.0 [J].
Alexeyenko, Andrey ;
Schmitt, Thomas ;
Tjarnberg, Andreas ;
Guala, Dmitri ;
Frings, Oliver ;
Sonnhammer, Erik L. L. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D821-D828
[2]   Global networks of functional coupling in eukaryotes from comprehensive data integration [J].
Alexeyenko, Andrey ;
Sonnhammer, Erik L. L. .
GENOME RESEARCH, 2009, 19 (06) :1107-1116
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   An unbiased evaluation of gene prioritization tools [J].
Bornigen, Daniela ;
Tranchevent, Leon-Charles ;
Bonachela-Capdevila, Francisco ;
Devriendt, Koenraad ;
De Moor, Bart ;
De Causmaecker, Patrick ;
Moreau, Yves .
BIOINFORMATICS, 2012, 28 (23) :3081-3088
[6]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[7]   Chapter 15: Disease Gene Prioritization [J].
Bromberg, Yana .
PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (04)
[8]   Disease candidate gene identification and prioritization using protein interaction networks [J].
Chen, Jing ;
Aronow, Bruce J. ;
Jegga, Anil G. .
BMC BIOINFORMATICS, 2009, 10
[9]   Recent approaches to the prioritization of candidate disease genes [J].
Doncheva, Nadezhda T. ;
Kacprowski, Tim ;
Albrecht, Mario .
WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE, 2012, 4 (05) :429-442
[10]   MaxLink: network-based prioritization of genes tightly linked to a disease seed set [J].
Guala, Dimitri ;
Sjolund, Erik ;
Sonnhammer, Erik L. L. .
BIOINFORMATICS, 2014, 30 (18) :2689-2690