Computational algorithms to predict Gene Ontology annotations

被引:17
|
作者
Pinoli, Pietro [1 ]
Chicco, Davide [1 ,2 ]
Masseroli, Marco [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, I-20133 Milan, Italy
[2] Univ Calif Irvine, Inst Genom & Bioinformat, Irvine, CA USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
SEMANTIC ANALYSIS; INFORMATION; GENOME;
D O I
10.1186/1471-2105-16-S6-S4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. Methods: We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. Results: We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. Conclusions: Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Protein annotation from protein interaction networks and Gene Ontology
    Nguyen, Cao D.
    Gardiner, Katheleen J.
    Cios, Krzysztof J.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (05) : 824 - 829
  • [32] LEARNING WITH GENE ONTOLOGY ANNOTATION USING FEATURE SELECTION AND CONSTRUCTION
    Akand, Elma
    Bain, Michael
    Temple, Mark
    APPLIED ARTIFICIAL INTELLIGENCE, 2010, 24 (1-2) : 5 - 38
  • [33] Filtering Association Rules in Gene Ontology Based on Term Specificity
    Shui, Yong
    Cho, Young-Rae
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1314 - 1321
  • [34] GOseek: A Gene Ontology Search Engine using Enhanced Keywords
    Taha, Kamal
    2013 35TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2013, : 1502 - 1505
  • [35] Integration of Gene Expression and Ontology for Clustering Functionally Similar Genes
    Paul, Sushmita
    ROUGH SETS, 2017, 10313 : 587 - 598
  • [36] Candidate Gene Identification for Systemic Lupus Erythematosus Using Network Centrality Measures and Gene Ontology
    Siddani, Bhaskara Rao
    Pochineni, Lakshmi Priyanka
    Palanisamy, Manimaran
    PLOS ONE, 2013, 8 (12):
  • [37] Ontology based text mining of gene-phenotype associations: application to candidate gene prediction
    Kafkas, Senay
    Hoehndorf, Robert
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2019,
  • [38] RuleGO: a logical rules-based tool for description of gene groups by means of Gene Ontology
    Gruca, Aleksandra
    Sikora, Marek
    Polanski, Andrzej
    NUCLEIC ACIDS RESEARCH, 2011, 39 : W293 - W301
  • [39] Database for exchangeable gene trap clones: Pathway and gene ontology analysis of exchangeable gene trap clone mouse lines
    Araki, Masatake
    Nakahara, Mai
    Muta, Mayumi
    Itou, Miharu
    Yanai, Chika
    Yamazoe, Fumika
    Miyake, Mikiko
    Morita, Ayaka
    Araki, Miyuki
    Okamoto, Yoshiyuki
    Nakagata, Naomi
    Yoshinobu, Kumiko
    Yamamura, Ken-ichi
    Araki, Kimi
    DEVELOPMENT GROWTH & DIFFERENTIATION, 2014, 56 (02) : 161 - 174
  • [40] Computational studies to predict or explain G protein coupled receptor polypharmacology
    Jacobson, Kenneth A.
    Costanzi, Stefano
    Paolettal, Silvia
    TRENDS IN PHARMACOLOGICAL SCIENCES, 2014, 35 (12) : 658 - 663