Computational algorithms to predict Gene Ontology annotations

被引:17
|
作者
Pinoli, Pietro [1 ]
Chicco, Davide [1 ,2 ]
Masseroli, Marco [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, I-20133 Milan, Italy
[2] Univ Calif Irvine, Inst Genom & Bioinformat, Irvine, CA USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
SEMANTIC ANALYSIS; INFORMATION; GENOME;
D O I
10.1186/1471-2105-16-S6-S4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. Methods: We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. Results: We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. Conclusions: Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Integrating comprehensive functional annotations to boost power and accuracy in gene-based association analysis
    Quick, Corbin
    Wen, Xiaoquan
    Abecasis, Goncalo
    Boehnke, Michael
    Kang, Hyun Min
    PLOS GENETICS, 2020, 16 (12):
  • [42] Computational Approaches for Gene Prediction: A Comparative Survey
    Al-Turaiki, Israa M.
    Mathkour, Hassan
    Touir, Ameur
    Hammami, Saleh
    INFORMATICS ENGINEERING AND INFORMATION SCIENCE, PT II, 2011, 252 : 14 - 25
  • [43] Gene ontology based quantitative index to select functionally diverse genes
    Paul, Sushmita
    Maji, Pradipta
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (02) : 245 - 262
  • [44] Combining sequence and Gene Ontology for protein module detection in the Weighted Network
    Yu, Yang
    Liu, Jie
    Feng, Nuan
    Song, Bo
    Zheng, Zeyu
    JOURNAL OF THEORETICAL BIOLOGY, 2017, 412 : 107 - 112
  • [45] Experimental correlation analysis of bicluster coherence measures and gene ontology information
    Padilha, Victor Alexandre
    de Leon Ferreira de Carvalho, Andre Carlos Ponce
    APPLIED SOFT COMPUTING, 2019, 85
  • [46] Combining Expression Data and Knowledge Ontology for Gene Clustering and Network Reconstruction
    Lee, Wei-Po
    Lin, Chung-Hsun
    COGNITIVE COMPUTATION, 2016, 8 (02) : 217 - 227
  • [47] Literature Mining and Ontology based Analysis of Host-Brucella Gene-Gene Interaction Network
    Karadeniz, Ilknur
    Hur, Junguk
    He, Yongqun
    Ozgur, Arzucan
    FRONTIERS IN MICROBIOLOGY, 2015, 6
  • [48] Gene networks in Drosophila melanogaster: integrating experimental data to predict gene function
    Costello, James C.
    Dalkilic, Mehmet M.
    Beason, Scott M.
    Gehlhausen, Jeff R.
    Patwardhan, Rupali
    Middha, Sumit
    Eads, Brian D.
    Andrews, Justen R.
    GENOME BIOLOGY, 2009, 10 (09):
  • [49] GAIL: An interactive webserver for inference and dynamic visualization of gene-gene associations based on gene ontology guided mining of biomedical literature
    Couch, Daniel
    Yu, Zhenning
    Nam, Jin Hyun
    Allen, Carter
    Ramos, Paula S.
    da Silveira, Willian A.
    Hunt, Kelly J.
    Hazard, Edward S.
    Hardiman, Gary
    Lawson, Andrew
    Chung, Dongjun
    PLOS ONE, 2019, 14 (07):
  • [50] GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms
    Zhao, Chenguang
    Wang, Zheng
    SCIENTIFIC REPORTS, 2018, 8