Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework

被引:5
作者
Weichenberger, Christian X. [1 ]
Palermo, Antonia [1 ]
Pramstaller, Peter P. [1 ]
Domingues, Francisco S. [1 ]
机构
[1] Univ Lubeck, European Acad Bozen Bolzano EURAC, Ctr Biomed, Viale Druso 1, I-39100 Bolzano, Italy
来源
SCIENTIFIC REPORTS | 2017年 / 7卷
关键词
GENE ONTOLOGY ANNOTATIONS; SEMANTIC SIMILARITY; INFORMATION-CONTENT; DAGO-FUN; SEQUENCE; TERMS; TOOL; EXPRESSION; PREDICTION;
D O I
10.1038/s41598-017-00465-5
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein functional similarity based on gene ontology (GO) annotations serves as a powerful tool when comparing proteins on a functional level in applications such as protein-protein interaction prediction, gene prioritization, and disease gene discovery. Functional similarity (FS) is usually quantified by combining the GO hierarchy with an annotation corpus that links genes and gene products to GO terms. One large group of algorithms involves calculation of GO term semantic similarity (SS) between all the terms annotating the two proteins, followed by a second step, described as "mixing strategy", which involves combining the SS values to yield the final FS value. Due to the variability of protein annotation caused e.g. by annotation bias, this value cannot be reliably compared on an absolute scale. We therefore introduce a similarity z-score that takes into account the FS background distribution of each protein. For a selection of popular SS measures and mixing strategies we demonstrate moderate accuracy improvement when using z-scores in a benchmark that aims to separate orthologous cases from random gene pairs and discuss in this context the impact of annotation corpus choice. The approach has been implemented in Frela, a fast high-throughput public web server for protein FS calculation and interpretation.
引用
收藏
页数:15
相关论文
共 57 条
[11]   The what, where, how and why of gene ontology-a primer for bioinformaticians [J].
du Plessis, Louis ;
Skunca, Nives ;
Dessimoz, Christophe .
BRIEFINGS IN BIOINFORMATICS, 2011, 12 (06) :723-735
[12]   From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity [J].
Gan, Mingxin ;
Dou, Xue ;
Jiang, Rui .
SCIENTIFIC WORLD JOURNAL, 2013,
[13]   Assessing identity, redundancy and confounds in Gene Ontology annotations over time [J].
Gillis, Jesse ;
Pavlidis, Paul .
BIOINFORMATICS, 2013, 29 (04) :476-482
[14]   Semantic similarity analysis of protein data: assessment with biological features and issues [J].
Guzzi, Pietro H. ;
Mina, Marco ;
Guerra, Concettina ;
Cannataro, Mario .
BRIEFINGS IN BIOINFORMATICS, 2012, 13 (05) :569-585
[15]  
Iglewicz B., 1993, How to Detect and Handle Outliers, VVol 16
[16]   An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology [J].
Jain, Shobhit ;
Bader, Gary D. .
BMC BIOINFORMATICS, 2010, 11
[17]   DISTANCE-WISE PATHWAY DISCOVERY FROM PROTEIN-PROTEIN INTERACTION NETWORKS WEIGHTED BY SEMANTIC SIMILARITY [J].
Jaromerska, Slavka ;
Praus, Petr ;
Cho, Young-Rae .
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014, 12 (01)
[18]   Constructing a gene semantic similarity network for the inference of disease genes [J].
Jiang, Rui ;
Gan, Mingxin ;
He, Peng .
BMC SYSTEMS BIOLOGY, 2011, 5
[19]   Estimating the annotation error rate of curated GO database sequence annotations [J].
Jones, Craig E. ;
Brown, Alfred L. ;
Baumann, Ute .
BMC BIOINFORMATICS, 2007, 8 (1)
[20]   Evaluating the Significance of Protein Functional Similarity Based on Gene Ontology [J].
Konopka, Bogumil M. ;
Golda, Tomasz ;
Kotulska, Malgorzata .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (11) :809-822