Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing

被引:33
作者
Zhao, Yingwen [1 ]
Fu, Guangyuan [1 ]
Wang, Jun [1 ]
Guo, Maozu [2 ,3 ]
Yu, Guoxian [1 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[2] Beijing Univ Civil Engn & Architecture, Sch Elect & Informat Engn, Beijing 100044, Peoples R China
[3] Beijing Key Lab Intelligent Proc Bldg Big Data, Beijing 100044, Peoples R China
关键词
Gene Ontology; Gene function prediction; Hierarchy preserving hashing; Semantic similarity; PROTEIN FUNCTION; SIMILARITY; ANNOTATIONS; NETWORK; ASSOCIATIONS; SEQUENCE; FEATURES;
D O I
10.1016/j.ygeno.2018.02.008
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at:http://mlda.swu.edu.cn/codes.php?name=HPHash.
引用
收藏
页码:334 / 342
页数:9
相关论文
共 64 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Deep learning for computational biology [J].
Angermueller, Christof ;
Parnamaa, Tanel ;
Parts, Leopold ;
Stegle, Oliver .
MOLECULAR SYSTEMS BIOLOGY, 2016, 12 (07)
[3]  
[Anonymous], BMC SYST BIOL
[4]  
[Anonymous], WEIGHTING SCHEME MET
[5]  
[Anonymous], 2013, J MACHINE LEARNING R
[6]  
[Anonymous], 2012, Pro- ceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
[7]  
[Anonymous], GENOME BIOL
[8]   Network medicine: a network-based approach to human disease [J].
Barabasi, Albert-Laszlo ;
Gulbahce, Natali ;
Loscalzo, Joseph .
NATURE REVIEWS GENETICS, 2011, 12 (01) :56-68
[9]   New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence [J].
Cao, Mengfei ;
Pietras, Christopher M. ;
Feng, Xian ;
Doroschak, Kathryn J. ;
Schaffner, Thomas ;
Park, Jisoo ;
Zhang, Hao ;
Cowen, Lenore J. ;
Hescott, Benjamin J. .
BIOINFORMATICS, 2014, 30 (12) :219-227
[10]   Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks [J].
Cao, Renzhi ;
Cheng, Jianlin .
METHODS, 2016, 93 :84-91