Improving protein function prediction using protein sequence and GO-term similarities

被引:25
|
作者
Makrodimitris, Stavros [1 ,2 ]
van Ham, Roeland C. H. J. [1 ,2 ]
Reinders, Marcel J. T. [1 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, Dept Intelligent Syst, NL-2628 CD Delft, Netherlands
[2] Keygene NV, Dept Bioinformat, NL-6708 PW Wageningen, Netherlands
关键词
GENE ONTOLOGY; CLASSIFICATION;
D O I
10.1093/bioinformatics/bty751
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Most automatic functional annotation methods assign Gene Ontology (GO) terms to proteins based on annotations of highly similar proteins. We advocate that proteins that are less similar are still informative. Also, despite their simplicity and structure, GO terms seem to be hard for computers to learn, in particular the Biological Process ontology, which has the most terms (>29000). We propose to use Label-Space Dimensionality Reduction (LSDR) techniques to exploit the redundancy of GO terms and transform them into a more compact latent representation that is easier to predict. Results We compare proteins using a sequence similarity profile (SSP) to a set of annotated training proteins. We introduce two new LSDR methods, one based on the structure of the GO, and one based on semantic similarity of terms. We show that these LSDR methods, as well as three existing ones, improve the Critical Assessment of Functional Annotation performance of several function prediction algorithms. Cross-validation experiments on Arabidopsis thaliana proteins pinpoint the superiority of our GO-aware LSDR over generic LSDR. Our experiments on A.thaliana proteins show that the SSP representation in combination with a kNN classifier outperforms state-of-the-art and baseline methods in terms of cross-validated F-measure. Availability and implementation Source code for the experiments is available at https://github.com/stamakro/SSP-LSDR. Supplementary information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:1116 / 1124
页数:9
相关论文
共 50 条
  • [21] Improving Protein Structural Class Prediction Using Novel Combined Sequence Information and Predicted Secondary Structural Features
    Dai, Qi
    Wu, Li
    Li, Lihua
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2011, 32 (16) : 3393 - 3398
  • [22] Multi-function Prediction of Unknown Protein Sequences Using Multilabel Classifiers and Augmented Sequence Features
    Agrawal, Saurabh
    Sisodia, Dilip Singh
    Nagwani, Naresh Kumar
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY TRANSACTION A-SCIENCE, 2021, 45 (04): : 1177 - 1189
  • [23] SDN2GO: An Integrated Deep Learning Model for Protein Function Prediction
    Cai, Yideng
    Wang, Jiacheng
    Deng, Lei
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
  • [24] QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs
    Smaili, Fatima Zohra
    Tian, Shuye
    Roy, Ambrish
    Alazmi, Meshari
    Arold, Stefan T.
    Mukherjee, Srayanta
    Hefty, P. Scott
    Chen, Wei
    Gao, Xin
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2021, 19 (06) : 998 - 1011
  • [25] MultiPredGO: Deep Multi-Modal Protein Function Prediction by Amalgamating Protein Structure, Sequence, and Interaction Information
    Giri, Swagarika Jaharlal
    Dutta, Pratik
    Halani, Parth
    Saha, Sriparna
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (05) : 1832 - 1838
  • [26] Improving prediction of heterodimeric protein complexes using combination with pairwise kernel
    Ruan, Peiying
    Hayashida, Morihiro
    Akutsu, Tatsuya
    Vert, Jean-Philippe
    BMC BIOINFORMATICS, 2018, 19
  • [27] Protein function annotation using protein domain family resources
    Das, Sayoni
    Orengo, Christine A.
    METHODS, 2016, 93 : 24 - 34
  • [28] Protein function prediction using functional inter-relationship
    Dhanuka, Richa
    Singh, Jyoti Prakash
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2021, 95 (95)
  • [29] Protein Function Prediction Using Deep Restricted Boltzmann Machines
    Zou, Xianchun
    Wang, Guijun
    Yu, Guoxian
    BIOMED RESEARCH INTERNATIONAL, 2017, 2017
  • [30] Protein Function Prediction Using Adaptive Swarm Based Algorithm
    Chowdhury, Archana
    Konar, Amit
    Rakshit, Pratyusha
    Janarthanan, Ramadoss
    SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, PT II (SEMCCO 2013), 2013, 8298 : 55 - 68