Measure the Semantic Similarity of GO Terms Using Aggregate Information Content

被引:37
作者
Song, Xuebo [1 ]
Li, Lin [2 ]
Srimani, Pradip K. [1 ]
Yu, Philip S. [3 ]
Wang, James Z. [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29634 USA
[2] Murray State Univ, Dept Comp Sci & Informat Syst, Murray, KY 42071 USA
[3] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
美国国家科学基金会;
关键词
Gene ontology; GO similarity; gene expression; G-SESAME; FUNCTIONAL SIMILARITY; GENE-EXPRESSION; PROTEIN-INTERACTION; ONTOLOGY; IDENTIFICATION; CEREVISIAE; CLUSTERS; DATABASE;
D O I
10.1109/TCBB.2013.176
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The rapid development of gene ontology (GO) and huge amount of biomedical data annotated by GO terms necessitate computation of semantic similarity of GO terms and, in turn, measurement of functional similarity of genes based on their annotations. In this paper we propose a novel and efficient method to measure the semantic similarity of GO terms. The proposed method addresses the limitations in existing GO term similarity measurement techniques; it computes the semantic content of a GO term by considering the information content of all of its ancestor terms in the graph. The aggregate information content (AIC) of all ancestor terms of a GO term implicitly reflects the GO term's location in the GO graph and also represents how human beings use this GO term and all its ancestor terms to annotate genes. We show that semantic similarity of GO terms obtained by our method closely matches the human perception. Extensive experimental studies show that this novel method also outperforms all existing methods in terms of the correlation with gene expression data. We have developed web services for measuring semantic similarity of GO terms and functional similarity of genes using the proposed AIC method and other popular methods. These web services are available at http://bioinformatics.clemson.edu/G-SESAME.
引用
收藏
页码:468 / 476
页数:9
相关论文
共 38 条
[1]  
[Anonymous], 2005, P ISMB 2005 SIG M BI
[2]  
[Anonymous], 1997, P 10 RES COMPUTATION
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   The Universal Protein Resource (UniProt) [J].
Bairoch, Amos ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Puy, Ghislaine Argoud ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
Saux, Virginie Bulliard-Le ;
decastro, Edouard ;
Ciampina, Luciane ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
David, Fabrice ;
Delbard, Gwennaelle ;
Dornevil, Dolnide ;
Duek-Roggli, Paula ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Feuermann, Marc ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gehant, Sebastian ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
Innocenti, Alessandro ;
James, Janet ;
Jain, Eric ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D190-D195
[5]  
Cheng Jill, 2004, J Biopharm Stat, V14, P687, DOI 10.1081/BIP-200025659
[6]   Validated intraclass correlation statistics to test item performance models [J].
Courrieu, Pierre ;
Brand-D'abrescia, Muriele ;
Peereman, Ronald ;
Spieler, Daniel ;
Rey, Arnaud .
BEHAVIOR RESEARCH METHODS, 2011, 43 (01) :37-55
[7]   G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery [J].
Du, Zhidian ;
Li, Lin ;
Chen, Chin-Fu ;
Yu, Philip S. ;
Wang, James Z. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W345-W349
[8]  
Faria D., 2007, DIFCULTR076
[9]   GOSim -: an R-package for computation of information theoretic GO similarities between terms and gene products [J].
Frohlich, Holger ;
Speer, Nora ;
Poustka, Annemarie ;
Beissarth, Tim .
BMC BIOINFORMATICS, 2007, 8 (1)
[10]   Judging the quality of gene expression-based clustering methods using gene annotation [J].
Gibbons, FD ;
Roth, FP .
GENOME RESEARCH, 2002, 12 (10) :1574-1581