Computational algorithms to predict Gene Ontology annotations

被引:17
|
作者
Pinoli, Pietro [1 ]
Chicco, Davide [1 ,2 ]
Masseroli, Marco [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, I-20133 Milan, Italy
[2] Univ Calif Irvine, Inst Genom & Bioinformat, Irvine, CA USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
SEMANTIC ANALYSIS; INFORMATION; GENOME;
D O I
10.1186/1471-2105-16-S6-S4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. Methods: We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. Results: We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. Conclusions: Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Computational Models and Algorithms for the Single Individual Haplotyping Problem
    Xie, Minzhu
    Wang, Jianxin
    Chen, Jianer
    Wu, Jingli
    Liu, Xucong
    CURRENT BIOINFORMATICS, 2010, 5 (01) : 18 - 28
  • [22] Integration of molecular network data reconstructs Gene Ontology
    Gligorijevic, Vladimir
    Janjic, Vuk
    Przuli, Natasa
    BIOINFORMATICS, 2014, 30 (17) : I594 - I600
  • [23] The Experimental Proteome of Leishmania infantum Promastigote and Its Usefulness for Improving Gene Annotations
    Sanchiz, Africa
    Morato, Esperanza
    Rastrojo, Alberto
    Camacho, Esther
    Gonzalez-de La Fuente, Sandra
    Marina, Anabel
    Aguado, Begona
    Requena, Jose M.
    GENES, 2020, 11 (09) : 1 - 20
  • [24] Use of Artificial Intelligence and Machine Learning Algorithms with Gene Expression Profiling to Predict Recurrent Nonmuscle Invasive Urothelial Carcinoma of the Bladder
    Bartsch, Georg, Jr.
    Mitra, Anirban P.
    Mitra, Sheetal A.
    Almal, Arpit A.
    Steven, Kenneth E.
    Skinner, Donald G.
    Fry, David W.
    Lenehan, Peter F.
    Worzel, William P.
    Cote, Richard J.
    JOURNAL OF UROLOGY, 2016, 195 (02): : 493 - 498
  • [25] Membrane gene ontology bias in sequencing and microarray obtained by housekeeping-gene analysis
    Zhang, Yijuan
    Akintola, Oluwafemi S.
    Liu, Ken J. A.
    Sun, Bingyun
    GENE, 2016, 575 (02) : 559 - 566
  • [26] Comparative genomics and community curation further improve gene annotations in the nematode Pristionchus pacificus
    Athanasouli, Marina
    Witte, Hanh
    Weiler, Christian
    Loschko, Tobias
    Eberhardt, Gabi
    Sommer, Ralf J.
    Roedelsperger, Christian
    BMC GENOMICS, 2020, 21 (01)
  • [27] Essential Protein Discovery based on Network Motif and Gene Ontology
    Kim, Wooyoung
    Li, Min
    Wang, Jianxin
    Pan, Yi
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 470 - 475
  • [28] Computational Algorithms Derived from Multiple Scales of Neocortical Processing
    Ingber, Lester
    COGNITIVE COMPUTATION, 2012, 4 (01) : 38 - 50
  • [29] Comparative genomics and community curation further improve gene annotations in the nematode Pristionchus pacificus
    Marina Athanasouli
    Hanh Witte
    Christian Weiler
    Tobias Loschko
    Gabi Eberhardt
    Ralf J. Sommer
    Christian Rödelsperger
    BMC Genomics, 21
  • [30] Automatic, context-specific generation of Gene Ontology slims
    Davis, Melissa J.
    Sehgal, Muhammad Shoaib B.
    Ragan, Mark A.
    BMC BIOINFORMATICS, 2010, 11