A New Path Based Hybrid Measure for Gene Ontology Similarity

被引:20
作者
Bandyopadhyay, Sanghamitra [1 ]
Mallick, Koushik [2 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Kolkata 700108, W Bengal, India
[2] RCC Inst Informat Technol, CSE Dept, Kolkata 700015, W Bengal, India
关键词
Gene ontology similarity; semantic similarity; term similarity; information content; protein interaction prediction; functional classification of genes; microRNA; SEMANTIC SIMILARITY; PROTEIN-INTERACTION; SACCHAROMYCES-CEREVISIAE; FUNCTIONAL SIMILARITY; R PACKAGE; DATABASE; GO; SEQUENCE; NETWORK; TOOLS;
D O I
10.1109/TCBB.2013.149
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Gene Ontology (GO) consists of a controlled vocabulary of terms, annotating a gene or gene product, structured in a directed acyclic graph. In the graph, semantic relations connect the terms, that represent the knowledge of functional description and cellular component information of gene products. GO similarity gives us a numerical representation of biological relationship between a gene set, which can be used to infer various biological facts such as protein interaction, structural similarity, gene clustering, etc. Here we introduce a new shortest path based hybrid measure of ontological similarity between two terms which combines both structure of the GO graph and information content of the terms. Here the similarity between two terms t(1) and t(2), referred to as GOSim(PBHM)(t(1), t(2)), has two components; one obtained from the common ancestors of t(1) and t(2). The other from their remaining ancestors. The proposed path based hybrid measure does not suffer from the well-known shallow annotation problem. Its superiority with respect to some other popular measures is established for protein protein interaction prediction, correlation with gene expression and functional classification of genes in a biological pathway. Finally, the proposed measure is utilized to compute the average GO similarity score among the genes that are experimentally validated targets of some microRNAs. Results demonstrate that the targets of a given miRNA have a high degree of similarity in the biological process category of GO.
引用
收藏
页码:116 / 127
页数:12
相关论文
共 43 条
[1]  
[Anonymous], 1997, P 10 RES COMPUTATION
[2]   NCBI GEO: mining tens of millions of expression profiles - database and tools update [J].
Barrett, Tanya ;
Troup, Dennis B. ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Rudnev, Dmitry ;
Evangelista, Carlos ;
Kim, Irene F. ;
Soboleva, Alexandra ;
Tomashevsky, Maxim ;
Edgar, Ron .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D760-D765
[3]   Kernel methods for predicting protein-protein interactions [J].
Ben-Hur, A ;
Noble, WS .
BIOINFORMATICS, 2005, 21 :I38-I46
[4]   IntelliGO: a new vector-based semantic similarity measure including annotation origin [J].
Benabderrahmane, Sidahmed ;
Smail-Tabbone, Malika ;
Poch, Olivier ;
Napoli, Amedeo ;
Devignes, Marie-Dominique .
BMC BIOINFORMATICS, 2010, 11
[5]   Co-clustering and visualization of gene expression data and gene ontology terms for Saccharomyces cerevisiae using self-organizing maps [J].
Brameier, Markus ;
Wiuf, Carsten .
JOURNAL OF BIOMEDICAL INFORMATICS, 2007, 40 (02) :160-173
[6]   Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms [J].
Christie, KR ;
Weng, S ;
Balakrishnan, R ;
Costanzo, MC ;
Dolinski, K ;
Dwight, SS ;
Engel, SR ;
Feierbach, B ;
Fisk, DG ;
Hirschman, JE ;
Hong, EL ;
Issel-Tarver, L ;
Nash, R ;
Sethuraman, A ;
Starr, B ;
Theesfeld, CL ;
Andrada, R ;
Binkley, G ;
Dong, Q ;
Lane, C ;
Schroeder, M ;
Botstein, D ;
Cherry, JM .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D311-D314
[7]   The what, where, how and why of gene ontology-a primer for bioinformaticians [J].
du Plessis, Louis ;
Skunca, Nives ;
Dessimoz, Christophe .
BRIEFINGS IN BIOINFORMATICS, 2011, 12 (06) :723-735
[8]   The relationship between protein sequences and their gene ontology functions [J].
Duan, Zhong-Hui ;
Hughes, Brent ;
Reichel, Lothar ;
Shi, Ting .
FIRST INTERNATIONAL MULTI-SYMPOSIUMS ON COMPUTER AND COMPUTATIONAL SCIENCES (IMSCCS 2006), PROCEEDINGS, VOL 1, 2006, :76-+
[9]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[10]   Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes [J].
Franke, Lude ;
van Bakel, Harm ;
Fokkens, Like ;
de Jong, Edwin D. ;
Egmont-Petersen, Michael ;
Wijmenga, Cisca .
AMERICAN JOURNAL OF HUMAN GENETICS, 2006, 78 (06) :1011-1025