InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk

被引:82
作者
Cheng, Liang [1 ]
Jiang, Yue [2 ]
Ju, Hong [3 ]
Sun, Jie [1 ]
Peng, Jiajie [4 ]
Zhou, Meng [1 ]
Hu, Yang [5 ]
机构
[1] Harbin Med Univ, Coll Bioinformat Sci & Technol, Harbin 150081, Heilongjiang, Peoples R China
[2] Hosp Sick Children, Toronto, ON M5G 1X8, Canada
[3] Heilongjiang Biol Sci & Technol Career Acad, Dept Informat Engn, Harbin 150081, Heilongjiang, Peoples R China
[4] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China
[5] Harbin Inst Technol, Sch Life Sci & Technol, Harbin 150088, Heilongjiang, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Biomedical ontology; Term similarities; Random walk; Information flow; HUMAN PHENOTYPE ONTOLOGY; GENE ONTOLOGY; SEMANTIC SIMILARITY; METABOLIC PATHWAYS; DATABASE; DISEASE; ANNOTATION; KNOWLEDGE; METACYC; BIOLOGY;
D O I
10.1186/s12864-017-4338-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Since the establishment of the first biomedical ontology Gene Ontology (GO), the number of biomedical ontology has increased dramatically. Nowadays over 300 ontologies have been built including extensively used Disease Ontology (DO) and Human Phenotype Ontology (HPO). Because of the advantage of identifying novel relationships between terms, calculating similarity between ontology terms is one of the major tasks in this research area. Though similarities between terms within each ontology have been studied with in silico methods, term similarities across different ontologies were not investigated as deeply. The latest method took advantage of gene functional interaction network (GFIN) to explore such inter-ontology similarities of terms. However, it only used gene interactions and failed to make full use of the connectivity among gene nodes of the network. In addition, all existent methods are particularly designed for GO and their performances on the extended ontology community remain unknown. Results: We proposed a method InfAcrOnt to infer similarities between terms across ontologies utilizing the entire GFIN. InfAcrOnt builds a term-gene-gene network which comprised ontology annotations and GFIN, and acquires similarities between terms across ontologies through modeling the information flow within the network by random walk. In our benchmark experiments on sub-ontologies of GO, InfAcrOnt achieves a high average area under the receiver operating characteristic curve (AUC) (0.9322 and 0.9309) and low standard deviations (1.8746e-6 and 3.0977e-6) in both human and yeast benchmark datasets exhibiting superior performance. Meanwhile, comparisons of InfAcrOnt results and prior knowledge on pair-wise DO-HPO terms and pair-wise DO-GO terms show high correlations. Conclusions: The experiment results show that InfAcrOnt significantly improves the performance of inferring similarities between terms across ontologies in benchmark set.
引用
收藏
页数:10
相关论文
共 52 条
[1]  
Agarwala R, 2018, NUCLEIC ACIDS RES, V46, pD8, DOI [10.1093/nar/gks1189, 10.1093/nar/gkx1095, 10.1093/nar/gkq1172]
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]  
Bodenreider O, 2005, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2005, P91
[4]  
Borgelt C, 2002, COMPSTAT 2002: PROCEEDINGS IN COMPUTATIONAL STATISTICS, P395
[5]   The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology [J].
Camon, E ;
Magrane, M ;
Barrell, D ;
Lee, V ;
Dimmer, E ;
Maslen, J ;
Binns, D ;
Harte, N ;
Lopez, R ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D262-D266
[6]  
Camon EB, 2005, BMC BIOINFORMATICS, V6, DOI 10.1186/1471-2105-6-S1-S17
[7]  
Caspi R, 2008, NUCLEIC ACIDS RES, V36, pD623, DOI [10.1093/nar/gkm900, 10.1093/nar/gkt1103]
[8]   MetaCyc: a multiorganism database of metabolic pathways and enzymes [J].
Caspi, Ron ;
Foerster, Hartmut ;
Fulcher, Carol A. ;
Hopkinson, Rebecca ;
Ingraham, John ;
Kaipa, Pallavi ;
Krummenacker, Markus ;
Paley, Suzanne ;
Pick, John ;
Rhee, Seung Y. ;
Tissier, Christophe ;
Zhang, Peifen ;
Karp, Peter D. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D511-D516
[9]  
Cheng L., 2016, ONCOTARGET
[10]   Using Semantic Association to Extend and Infer Literature-Oriented Relativity Between Terms [J].
Cheng, Liang ;
Li, Jie ;
Hu, Yang ;
Jiang, Yue ;
Liu, Yongzhuang ;
Chu, Yanshuo ;
Wang, Zhenxing ;
Wang, Yadong .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (06) :1219-1226