Improving protein succinylation sites prediction using embeddings from protein language model

被引:28
作者
Pokharel, Suresh [1 ]
Pratyush, Pawel [1 ]
Heinzinger, Michael [2 ,3 ]
Newman, Robert H. [4 ,5 ]
Dukka, B. K. C. [1 ]
机构
[1] Michigan Technol Univ, Dept Comp Sci, Houghton, MI 49931 USA
[2] TUM Tech Univ Munich, Dept Informat Bioinformat & Computat Biol I12, Boltzmannstr 3, D-85748 Garching, Germany
[3] Ctr Doctoral Studies Informat & Its Applicat CeDo, TUM Grad Sch, Boltzmannstr 11, D-85748 Garching, Germany
[4] North Carolina A&T State Univ, Coll Sci & Technol, Dept Biol, Greensboro, NC USA
[5] Univ N Carolina, Dept Chem, Chapel Hill, NC 27515 USA
基金
美国国家科学基金会;
关键词
LYSINE SUCCINYLATION; UNIREF;
D O I
10.1038/s41598-022-21366-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein succinylation is an important post-translational modification (PTM) responsible for many vital metabolic activities in cells, including cellular respiration, regulation, and repair. Here, we present a novel approach that combines features from supervised word embedding with embedding from a protein language model called ProtT5-XL-UniRef50 (hereafter termed, ProtT5) in a deep learning framework to predict protein succinylation sites. To our knowledge, this is one of the first attempts to employ embedding from a pre-trained protein language model to predict protein succinylation sites. The proposed model, dubbed LMSuccSite, achieves state-of-the-art results compared to existing methods, with performance scores of 0.36, 0.79, 0.79 for MCC, sensitivity, and specificity, respectively. LMSuccSite is likely to serve as a valuable resource for exploration of succinylation and its role in cellular physiology and disease.
引用
收藏
页数:13
相关论文
共 16 条
  • [1] Detecting Succinylation sites from protein sequences using ensemble support vector machine
    Ning, Qiao
    Zhao, Xiaosa
    Bao, Lingling
    Ma, Zhiqiang
    Zhao, Xiaowei
    BMC BIOINFORMATICS, 2018, 19
  • [2] A protein succinylation sites prediction method based on the hybrid architecture of LSTM network and CNN
    Zhang, Die
    Wang, Shunfang
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2022, 20 (02)
  • [3] Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model
    Zhang, Lu
    Liu, Min
    Qin, Xinyi
    Liu, Guangzhong
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2020, 2020
  • [4] SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties
    Hasan, Md. Mehedi
    Yang, Shiping
    Zhou, Yuan
    Mollah, Md. Nurul Haque
    MOLECULAR BIOSYSTEMS, 2016, 12 (03) : 786 - 795
  • [5] A Comprehensive Comparative Review of Protein Sequence-Based Computational Prediction Models of Lysine Succinylation Sites
    Tasmia, Samme Amena
    Kibria, Md. Kaderi
    Islam, Md. Ariful
    Khatun, Mst Shamima
    Mollah, Md. Nurul Haque
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2022, 23 (11) : 744 - 756
  • [6] The Prediction of Succinylation Site in Protein by Analyzing Amino Acid Composition
    Van-Minh Bui
    Van-Nui Nguyen
    ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 538 : 633 - 642
  • [7] A systematic identification of species-specific protein succinylation sites using joint element features information
    Hasan, Md Mehedi
    Khatun, Mst Shamima
    Mollah, Md Nurul Haque
    Yong, Cao
    Guo, Dianjing
    INTERNATIONAL JOURNAL OF NANOMEDICINE, 2017, 12 : 1 - 13
  • [8] Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique
    Zhao, Xiaowei
    Ning, Qiao
    Chai, Haiting
    Ma, Zhiqiang
    JOURNAL OF THEORETICAL BIOLOGY, 2015, 374 : 60 - 65
  • [9] DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction
    Thapa, Niraj
    Chaudhari, Meenal
    McManus, Sean
    Roy, Kaushik
    Newman, Robert H.
    Saigo, Hiroto
    KC, Dukka B.
    BMC BIOINFORMATICS, 2020, 21 (Suppl 3)
  • [10] SuccSPred: Succinylation Sites Prediction Using Fused Feature Representation and Ranking Method
    Ge, Ruiquan
    Luo, Yizhang
    Feng, Guanwen
    Jia, Gangyong
    Zhang, Hua
    Xu, Chong
    Xu, Gang
    Wang, Pu
    BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2021, 2021, 13064 : 191 - 202