Measuring semantic similarities by combining gene ontology annotations and gene co-function networks

被引:45
作者
Peng, Jiajie [1 ,2 ]
Uygun, Sahra [2 ,3 ]
Kim, Taehyong [4 ]
Wang, Yadong [1 ]
Rhee, Seung Y. [4 ]
Chen, Jin [2 ,5 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150006, Peoples R China
[2] Michigan State Univ, Dept Energy, Plant Res Lab, E Lansing, MI 48824 USA
[3] Michigan State Univ, Genet Program, E Lansing, MI 48824 USA
[4] Carnegie Inst Sci, Dept Plant Biol, Stanford, CA 94305 USA
[5] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
基金
国家高技术研究发展计划(863计划); 美国国家科学基金会;
关键词
Co-Function network; Gene ontology; Semantic similarity; Gene function annotation; METABOLIC PATHWAYS; FUNCTIONAL SIMILARITY; GENOME; ASSOCIATION; INFORMATION; DATABASE; CATEGORIZATION; PREDICTION; RESOURCE; TAXONOMY;
D O I
10.1186/s12859-015-0474-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO- based similarity because of the limited proportion of genes that are annotated to GO in most organisms. Results: We introduce a novel approach called NETSIM (network- based similarity measure) that incorporates information from gene co- function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstrate that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Conclusions: Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome- specific information. NETSIM incorporates both GO annotations and gene co- function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon- specific manner become measurable when GO annotations are limited. Supplementary information and software are available at http://www.msu.edu/similar to jinchen/NETSIM.
引用
收藏
页数:14
相关论文
共 47 条
[1]   Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   QuickGO: a web-based tool for Gene Ontology searching [J].
Binns, David ;
Dimmer, Emily ;
Huntley, Rachael ;
Barrell, Daniel ;
O'Donovan, Claire ;
Apweiler, Rolf .
BIOINFORMATICS, 2009, 25 (22) :3045-3046
[5]   Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network [J].
Christine Brun ;
François Chevenet ;
David Martin ;
Jérôme Wojcik ;
Alain Guénoche ;
Bernard Jacq .
Genome Biology, 5 (1)
[6]  
Caspi R, 2008, NUCLEIC ACIDS RES, V36, pD623, DOI [10.1093/nar/gkm900, 10.1093/nar/gkt1103]
[7]   Diverse Transcriptional Programs Associated with Environmental Stress and Hormones in the Arabidopsis Receptor-Like Kinase Gene Family [J].
Chae, Lee ;
Sudat, Sylvia ;
Dudoit, Sandrine ;
Zhu, Tong ;
Luan, Sheng .
MOLECULAR PLANT, 2009, 2 (01) :84-107
[8]   Evaluation of high-throughput functional categorization of human disease genes [J].
Chen, James L. ;
Liu, Yang ;
Sam, Lee T. ;
Li, Jianrong ;
Lussier, Yves A. .
BMC BIOINFORMATICS, 2007, 8 (Suppl 3)
[9]   Saccharomyces Genome Database: the genomics resource of budding yeast [J].
Cherry, J. Michael ;
Hong, Eurie L. ;
Amundsen, Craig ;
Balakrishnan, Rama ;
Binkley, Gail ;
Chan, Esther T. ;
Christie, Karen R. ;
Costanzo, Maria C. ;
Dwight, Selina S. ;
Engel, Stacia R. ;
Fisk, Dianna G. ;
Hirschman, Jodi E. ;
Hitz, Benjamin C. ;
Karra, Kalpana ;
Krieger, Cynthia J. ;
Miyasato, Stuart R. ;
Nash, Rob S. ;
Park, Julie ;
Skrzypek, Marek S. ;
Simison, Matt ;
Weng, Shuai ;
Wong, Edith D. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D700-D705
[10]   Using GOstats to test gene lists for GO term association [J].
Falcon, S. ;
Gentleman, R. .
BIOINFORMATICS, 2007, 23 (02) :257-258