Improving protein function prediction using protein sequence and GO-term similarities

被引:25
作者
Makrodimitris, Stavros [1 ,2 ]
van Ham, Roeland C. H. J. [1 ,2 ]
Reinders, Marcel J. T. [1 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, Dept Intelligent Syst, NL-2628 CD Delft, Netherlands
[2] Keygene NV, Dept Bioinformat, NL-6708 PW Wageningen, Netherlands
关键词
GENE ONTOLOGY; CLASSIFICATION;
D O I
10.1093/bioinformatics/bty751
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Most automatic functional annotation methods assign Gene Ontology (GO) terms to proteins based on annotations of highly similar proteins. We advocate that proteins that are less similar are still informative. Also, despite their simplicity and structure, GO terms seem to be hard for computers to learn, in particular the Biological Process ontology, which has the most terms (>29000). We propose to use Label-Space Dimensionality Reduction (LSDR) techniques to exploit the redundancy of GO terms and transform them into a more compact latent representation that is easier to predict. Results We compare proteins using a sequence similarity profile (SSP) to a set of annotated training proteins. We introduce two new LSDR methods, one based on the structure of the GO, and one based on semantic similarity of terms. We show that these LSDR methods, as well as three existing ones, improve the Critical Assessment of Functional Annotation performance of several function prediction algorithms. Cross-validation experiments on Arabidopsis thaliana proteins pinpoint the superiority of our GO-aware LSDR over generic LSDR. Our experiments on A.thaliana proteins show that the SSP representation in combination with a kNN classifier outperforms state-of-the-art and baseline methods in terms of cross-validated F-measure. Availability and implementation Source code for the experiments is available at https://github.com/stamakro/SSP-LSDR. Supplementary information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:1116 / 1124
页数:9
相关论文
共 50 条
  • [31] Multitask Protein Function Prediction through Task Dissimilarity
    Frasca, Marco
    Bianchi, Nicolo Cesa
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (05) : 1550 - 1560
  • [32] Protein function prediction using functional inter-relationship
    Dhanuka, Richa
    Singh, Jyoti Prakash
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2021, 95 (95)
  • [33] Integrating protein-protein interactions and text mining for protein function prediction
    Samira Jaeger
    Sylvain Gaudan
    Ulf Leser
    Dietrich Rebholz-Schuhmann
    BMC Bioinformatics, 9
  • [34] NetGO: improving large-scale protein function prediction with massive network information
    You, Ronghui
    Yao, Shuwei
    Xiong, Yi
    Huang, Xiaodi
    Sun, Fengzhu
    Mamitsuka, Hiroshi
    Zhu, Shanfeng
    NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) : W379 - W387
  • [35] Majority Vote Cascading: A Semi-Supervised Framework for Improving Protein Function Prediction
    Lazarsfeld, John
    Rodriguez, Jonathan
    Erden, Mert
    Liu, Yuelin
    Cowen, Lenore J.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (04) : 1933 - 1945
  • [36] INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity
    Piovesan, Damiano
    Giollo, Manuel
    Leonardi, Emanuela
    Ferrari, Carlo
    Tosatto, Silvio C. E.
    NUCLEIC ACIDS RESEARCH, 2015, 43 (W1) : W134 - W140
  • [37] Protein Function Prediction based on Physiochemical Properties and Protein Granularity
    Wang, Wanlu
    Meng, Jun
    Zhang, Xin
    Luan, Yushi
    2013 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2013, : 342 - 346
  • [38] Majority Vote Cascading: A Semi-Supervised Framework for Improving Protein Function Prediction
    Lazarsfeld, John
    Rodriguez, Jonathan
    Erden, Mert
    Liu, Yuelin
    Cowen, Lenore J.
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 51 - 60
  • [39] Protein functional class prediction using global encoding of amino acid sequence
    Li, Xi
    Liao, Bo
    Shu, Yu
    Zeng, Qingguang
    Luo, Jiawei
    JOURNAL OF THEORETICAL BIOLOGY, 2009, 261 (02) : 290 - 293
  • [40] An iterative approach of protein function prediction
    Xiaoxiao Chi
    Jingyu Hou
    BMC Bioinformatics, 12