Improving protein function prediction using protein sequence and GO-term similarities

被引:25
作者
Makrodimitris, Stavros [1 ,2 ]
van Ham, Roeland C. H. J. [1 ,2 ]
Reinders, Marcel J. T. [1 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, Dept Intelligent Syst, NL-2628 CD Delft, Netherlands
[2] Keygene NV, Dept Bioinformat, NL-6708 PW Wageningen, Netherlands
关键词
GENE ONTOLOGY; CLASSIFICATION;
D O I
10.1093/bioinformatics/bty751
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Most automatic functional annotation methods assign Gene Ontology (GO) terms to proteins based on annotations of highly similar proteins. We advocate that proteins that are less similar are still informative. Also, despite their simplicity and structure, GO terms seem to be hard for computers to learn, in particular the Biological Process ontology, which has the most terms (>29000). We propose to use Label-Space Dimensionality Reduction (LSDR) techniques to exploit the redundancy of GO terms and transform them into a more compact latent representation that is easier to predict. Results We compare proteins using a sequence similarity profile (SSP) to a set of annotated training proteins. We introduce two new LSDR methods, one based on the structure of the GO, and one based on semantic similarity of terms. We show that these LSDR methods, as well as three existing ones, improve the Critical Assessment of Functional Annotation performance of several function prediction algorithms. Cross-validation experiments on Arabidopsis thaliana proteins pinpoint the superiority of our GO-aware LSDR over generic LSDR. Our experiments on A.thaliana proteins show that the SSP representation in combination with a kNN classifier outperforms state-of-the-art and baseline methods in terms of cross-validated F-measure. Availability and implementation Source code for the experiments is available at https://github.com/stamakro/SSP-LSDR. Supplementary information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:1116 / 1124
页数:9
相关论文
共 50 条
  • [41] Protein Function Prediction with Incomplete Annotations
    Yu, Guoxian
    Rangwala, Huzefa
    Domeniconi, Carlotta
    Zhang, Guoji
    Yu, Zhiwen
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (03) : 579 - 591
  • [42] An iterative approach of protein function prediction
    Chi, Xiaoxiao
    Hou, Jingyu
    BMC BIOINFORMATICS, 2011, 12
  • [43] A New Protein Structure Representation for Efficient Protein Function Prediction
    Maghawry, Huda A.
    Mostafa, Mostafa G. M.
    Gharib, Tarek F.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (12) : 936 - 946
  • [44] ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network
    Cao, Renzhi
    Freitas, Colton
    Chan, Leong
    Sun, Miao
    Jiang, Haiqing
    Chen, Zhangxin
    MOLECULES, 2017, 22 (10):
  • [45] Protein function prediction using guilty by association from interaction networks
    Piovesan, Damiano
    Giollo, Manuel
    Ferrari, Carlo
    Tosatto, Silvio C. E.
    AMINO ACIDS, 2015, 47 (12) : 2583 - 2592
  • [46] Protein function prediction using guilty by association from interaction networks
    Damiano Piovesan
    Manuel Giollo
    Carlo Ferrari
    Silvio C. E. Tosatto
    Amino Acids, 2015, 47 : 2583 - 2592
  • [47] A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms
    Zhang, Xiao-Fei
    Dai, Dao-Qing
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (03) : 740 - 753
  • [48] IAS: Interaction Specific GO Term Associations for Predicting Protein-Protein Interaction Networks
    Yerneni, Satwica
    Khan, Ishita K.
    Wei, Qing
    Kihara, Daisuke
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (04) : 1247 - 1258
  • [49] ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization
    Wen-Lin Huang
    Chun-Wei Tung
    Shih-Wen Ho
    Shiow-Fen Hwang
    Shinn-Ying Ho
    BMC Bioinformatics, 9
  • [50] Exploring the sequence, function, and evolutionary space of protein superfamilies using sequence similarity networks and phylogenetic reconstructions
    Copp, Janine N.
    Anderson, Dave W.
    Akiva, Eyal
    Babbitt, Patricia C.
    Tokuriki, Nobuhiko
    NEW APPROACHES FOR FLAVIN CATALYSIS, 2019, 620 : 315 - 347