Computational algorithms to predict Gene Ontology annotations

被引:17
|
作者
Pinoli, Pietro [1 ]
Chicco, Davide [1 ,2 ]
Masseroli, Marco [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, I-20133 Milan, Italy
[2] Univ Calif Irvine, Inst Genom & Bioinformat, Irvine, CA USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
SEMANTIC ANALYSIS; INFORMATION; GENOME;
D O I
10.1186/1471-2105-16-S6-S4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. Methods: We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. Results: We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. Conclusions: Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Using computational predictions to improve literature-based Gene Ontology annotations: a feasibility study
    Costanzo, Maria C.
    Park, Julie
    Balakrishnan, Rama
    Cherry, J. Michael
    Hong, Eurie L.
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2011,
  • [2] CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations
    Park, Julie
    Costanzo, Maria C.
    Balakrishnan, Rama
    Cherry, J. Michael
    Hong, Eurie L.
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012,
  • [3] Measuring semantic similarities by combining gene ontology annotations and gene co-function networks
    Peng, Jiajie
    Uygun, Sahra
    Kim, Taehyong
    Wang, Yadong
    Rhee, Seung Y.
    Chen, Jin
    BMC BIOINFORMATICS, 2015, 16
  • [4] Ontology-Based Prediction and Prioritization of Gene Functional Annotations
    Chicco, Davide
    Masseroli, Marco
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (02) : 248 - 260
  • [5] Predicting Novel Human Gene Ontology Annotations Using Semantic Analysis
    Done, Bogdan
    Khatri, Purvesh
    Done, Arina
    Draghici, Sorin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (01) : 91 - 99
  • [6] Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations
    Ikram, Najmul
    Qadir, Muhammad Abdul
    Afzal, Muhammad Tanvir
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (03) : 905 - 912
  • [7] PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data
    Hawkins, Troy
    Chitale, Meghana
    Luban, Stanislav
    Kihara, Daisuke
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 74 (03) : 566 - 582
  • [8] Linking the Gene Ontology with Social Ontology: A Prolegomena to the Ontology of Personhood
    Koepsell, David R.
    FORMAL ONTOLOGY IN INFORMATION SYSTEMS, 2006, 150 : 301 - 308
  • [9] Liftoff: accurate mapping of gene annotations
    Shumate, Alaina
    Salzberg, Steven L.
    BIOINFORMATICS, 2021, 37 (12) : 1639 - 1643
  • [10] Assessing the quality of annotations in asthma gene expression experiments
    Lacson, Ronilda
    Mbagwu, Michael
    Yousif, Hisham
    Ohno-Machado, Lucila
    BMC BIOINFORMATICS, 2010, 11