Defining functional distances over Gene Ontology

被引:44
作者
del Pozo, Angela [1 ]
Pazos, Florencio [2 ]
Valencia, Alfonso [1 ]
机构
[1] CNIO, Struct Biol & Biocomp Programme, E-28029 Madrid, Spain
[2] CSIC, Natl Biotechnol Ctr, CNB, Computat Syst Biol Grp, E-28049 Madrid, Spain
关键词
D O I
10.1186/1471-2105-9-50
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: A fundamental problem when trying to define the functional relationships between proteins is the difficulty in quantifying functional similarities, even when well-structured ontologies exist regarding the activity of proteins (i.e. 'gene ontology'-GO-). However, functional metrics can overcome the problems in the comparing and evaluating functional assignments and predictions. As a reference of proximity, previous approaches to compare GO terms considered linkage in terms of ontology weighted by a probability distribution that balances the non-uniform 'richness' of different parts of the Direct Acyclic Graph. Here, we have followed a different approach to quantify functional similarities between GO terms. Results: We propose a new method to derive 'functional distances' between GO terms that is based on the simultaneous occurrence of terms in the same set of Interpro entries, instead of relying on the structure of the GO. The coincidence of GO terms reveals natural biological links between the GO functions and defines a distance model D-f which fulfils the properties of a Metric Space. The distances obtained in this way can be represented as a hierarchical 'Functional Tree'. Conclusion: The method proposed provides a new definition of distance that enables the similarity between GO terms to be quantified. Additionally, the 'Functional Tree' defines groups with biological meaning enhancing its utility for protein function comparison and prediction. Finally, this approach could be for function-based protein searches in databases, and for analysing the gene clusters produced by DNA array experiments.
引用
收藏
页数:15
相关论文
共 27 条
[1]  
[Anonymous], 2005, P 14 ACM INT C INFOR
[2]  
[Anonymous], 030501 U WASH DEP CO
[3]  
[Anonymous], 1997, CBMS REGIONAL C SERI
[4]   The gene ontology annotation (GOA) project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro [J].
Camon, E ;
Magrane, M ;
Barrell, D ;
Binns, D ;
Fleischmann, W ;
Kersey, P ;
Mulder, N ;
Oinn, T ;
Maslen, J ;
Cox, A ;
Apweiler, R .
GENOME RESEARCH, 2003, 13 (04) :662-672
[5]  
Duda RO, 2006, PATTERN CLASSIFICATI
[6]   Automated protein function prediction - the genomic challenge [J].
Friedberg, Iddo .
BRIEFINGS IN BIOINFORMATICS, 2006, 7 (03) :225-242
[7]  
HARRIS MA, 2004, BIOINFORMATICS, pD258
[8]   DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists [J].
Huang, Da Wei ;
Sherman, Brad T. ;
Tan, Qina ;
Kir, Joseph ;
Liu, David ;
Bryant, David ;
Guo, Yongjian ;
Stephens, Robert ;
Baseler, Michael W. ;
Lane, H. Clifford ;
Lempicki, Richard A. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :W169-W175
[9]   COVARIATION OF MUTATIONS IN THE V3 LOOP OF HUMAN-IMMUNODEFICIENCY-VIRUS TYPE-1 ENVELOPE PROTEIN - AN INFORMATION-THEORETIC ANALYSIS [J].
KORBER, BTM ;
FARBER, RM ;
WOLPERT, DH ;
LAPEDES, AS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1993, 90 (15) :7176-7180
[10]   Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation [J].
Letunic, Ivica ;
Bork, Peer .
BIOINFORMATICS, 2007, 23 (01) :127-128