Evaluating network-based missing protein prediction using p-values, Bayes Factors, and probabilities

被引:1
作者
Goh, Wilson Wen Bin [1 ,2 ,3 ]
Kong, Weijia [1 ,3 ]
Wong, Limsoon [4 ]
机构
[1] Nanyang Technol Univ, Lee Kong Chian Sch Med, Singapore 308232, Singapore
[2] Nanyang Technol Univ, Ctr Biomed Informat, Singapore 308232, Singapore
[3] Nanyang Technol Univ, Sch Biol Sci, Singapore 637551, Singapore
[4] Natl Univ Singapore, Dept Comp Sci, Singapore 117417, Singapore
关键词
Statistics; data science; machine learning; missing proteins; networks; PROTREC; p-values; Bayes factors;
D O I
10.1142/S0219720023500051
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Some prediction methods use probability to rank their predictions, while some other prediction methods do not rank their predictions and instead use p-values to support their predictions. This disparity renders direct cross-comparison of these two kinds of methods difficult. In particular, approaches such as the Bayes Factor upper Bound (BFB) for p-value conversion may not make correct assumptions for this kind of cross-comparisons. Here, using a well-established case study on renal cancer proteomics and in the context of missing protein prediction, we demonstrate how to compare these two kinds of prediction methods using two different strategies. The first strategy is based on false discovery rate (FDR) estimation, which does not make the same naive assumptions as BFB conversions. The second strategy is a powerful approach which we colloquially call "home ground testing ". Both strategies perform better than BFB conversions. Thus, we recommend comparing prediction methods by standardization to a common performance benchmark such as a global FDR. And where this is not possible, we recommend reciprocal "home ground testing ".
引用
收藏
页数:13
相关论文
共 23 条
[1]  
[Anonymous], 2004, SPRINGER SERIES STAT
[2]   Accelerating the search for the missing proteins in the human proteome [J].
Baker, Mark S. ;
Ahn, Seong Beom ;
Mohamedali, Abidali ;
Islam, Mohammad T. ;
Cantor, David ;
Verhaert, Peter D. ;
Fanayan, Susan ;
Sharma, Samridhi ;
Nice, Edouard C. ;
Connor, Mark ;
Ranganathan, Shoba .
NATURE COMMUNICATIONS, 2017, 8
[3]  
Benjamin D. J., 2017, NAT HUM BEHAV, V1
[4]   Three Recommendations for Improving the Use of p-Values [J].
Benjamin, Daniel J. ;
Berger, James O. .
AMERICAN STATISTICIAN, 2019, 73 :186-191
[5]   Advancing Clinical Proteomics via Analysis Based on Biological Complexes: A Tale of Five Paradigms [J].
Bin Goh, Wilson Wen ;
Wong, Limsoon .
JOURNAL OF PROTEOME RESEARCH, 2016, 15 (09) :3167-3179
[6]   Comparative Network-Based Recovery Analysis and Proteomic Profiling of Neurological Changes in Valproic Acid-Treated Mice [J].
Bin Goh, Wilson Wen ;
Sergot, Marek J. ;
Sng, Judy Cg ;
Wong, Limsoon .
JOURNAL OF PROTEOME RESEARCH, 2013, 12 (05) :2116-2127
[7]   GO::TermFinder - open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes [J].
Boyle, EI ;
Weng, SA ;
Gollub, J ;
Jin, H ;
Botstein, D ;
Cherry, JM ;
Sherlock, G .
BIOINFORMATICS, 2004, 20 (18) :3710-3715
[8]   Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis [J].
Charmpi, Konstantina ;
Ycart, Bernard .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2015, 14 (03) :279-293
[9]   PathNet: a tool for pathway analysis using topological information [J].
Dutta, Bhaskar ;
Wallqvist, Anders ;
Reifman, Jaques .
SOURCE CODE FOR BIOLOGY AND MEDICINE, 2012, 7 (01)
[10]  
Edgington E.S., 2011, Randomization Tests, P1182, DOI [DOI 10.1007/978-3-642-04898-2_56, 10.1007/978-3-642-04898-256, DOI 10.1007/978-3-642-04898-256]