LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model

被引:12
作者
Pakhrin, Subash C. [1 ,2 ]
Pokharel, Suresh [3 ]
Pratyush, Pawel [3 ]
Chaudhari, Meenal [4 ]
Ismail, Hamid D. [3 ]
Dukka, B. K. C. B. [3 ]
机构
[1] Wichita State Univ, Sch Comp, Wichita, KS 67260 USA
[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA
[3] Michigan Technol Univ, Dept Comp Sci, Houghton, MI 49931 USA
[4] North Carolina A&T State Univ, Dept Biol, Greensboro, NC 27411 USA
基金
美国国家科学基金会;
关键词
post-translational modification; protein language model; phosphorylation; deep learning; stack generalization; score-level fusion; embedding; RESOURCE; ASSOCIATION; DATABASE;
D O I
10.1021/acs.jproteome.2c00667
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Phosphorylation is one of the most important post-translationalmodifications and plays a pivotal role in various cellular processes.Although there exist several computational tools to predict phosphorylationsites, existing tools have not yet harnessed the knowledge distilledby pretrained protein language models. Herein, we present a noveldeep learning-based approach called LMPhosSite for the general phosphorylationsite prediction that integrates embeddings from the local window sequenceand the contextualized embedding obtained using global (overall) proteinsequence from a pretrained protein language model to improve the predictionperformance. Thus, the LMPhosSite consists of two base-models: onefor capturing effective local representation and the other for capturingglobal per-residue contextualized embedding from a pretrained proteinlanguage model. The output of these base-models is integrated usinga score-level fusion approach. LMPhosSite achieves a precision, recall,Matthew's correlation coefficient, and F1-score of 38.78%, 67.12%,0.390, and 49.15%, for the combined serine and threonine independenttest data set and 34.90%, 62.03%, 0.298, and 44.67%, respectively,for the tyrosine independent test data set, which is better than thecompared approaches. These results demonstrate that LMPhosSite isa robust computational tool for the prediction of the general phosphorylationsites in proteins.
引用
收藏
页码:2548 / 2557
页数:10
相关论文
共 58 条
  • [1] FELINE GASTRIN . AN EXAMPLE OF PEPTIDE SEQUENCE ANALYSIS BY MASS SPECTROMETRY
    AGARWAL, KL
    KENNER, GW
    SHEPPARD, RC
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1969, 91 (11) : 3096 - &
  • [2] 32P LABELING OF PROTEIN PHOSPHORYLATION AND METABOLITE ASSOCIATION IN THE MITOCHONDRIA MATRIX
    Aponte, Angel M.
    Phillips, Darci
    Harris, Robert A.
    Blinova, Ksenia
    French, Stephanie
    Johnson, D. Thor
    Balaban, Robert S.
    [J]. METHODS IN ENZYMOLOGY, VOL 457: MITOCHONDRIAL FUNCTION, PARTB MITOCHONDRIAL PROTEIN KINASES, PROTEIN PHOSPHATASES AND MITOCHONDRIAL DISEASES, 2009, 457 : 63 - 80
  • [3] Activities at the Universal Protein Resource (UniProt)
    Apweiler, Rolf
    Bateman, Alex
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alam-Faruque, Yasmin
    Alpi, Emanuele
    Antunes, Ricardo
    Arganiska, Joanna
    Casanova, Elisabet Barrera
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Chan, Wei Mun
    Chavali, Gayatri
    Cibrian-Uhalte, Elena
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Fazzini, Francesco
    Gane, Paul
    Castro, Leyla Garcia
    Garmiri, Penelope
    Hatton-Ellis, Emma
    Hieta, Reija
    Huntley, Rachael
    Legge, Duncan
    Liu, Wudong
    Luo, Jie
    MacDougall, Alistair
    Mutowo, Prudence
    Nightingale, Andrew
    Orchard, Sandra
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Turner, Edward
    Volynkin, Vladimir
    Wardell, Tony
    Watkins, Xavier
    Zellner, Hermann
    Corbett, Matt
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D191 - D198
  • [4] UniProt: the universal protein knowledgebase in 2021
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Agivetova, Rahat
    Ahmad, Shadab
    Alpi, Emanuele
    Bowler-Barnett, Emily H.
    Britto, Ramona
    Bursteinas, Borisas
    Bye-A-Jee, Hema
    Coetzee, Ray
    Cukura, Austra
    Da Silva, Alan
    Denny, Paul
    Dogan, Tunca
    Ebenezer, ThankGod
    Fan, Jun
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Hussein, Abdulrahman
    Ignatchenko, Alexandr
    Insana, Giuseppe
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lock, Antonia
    Lopez, Rodrigo
    Luciani, Aurelien
    Luo, Jie
    Lussi, Yvonne
    Mac-Dougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Mishra, Alok
    Moulang, Katie
    Nightingale, Andrew
    Oliveira, Carla Susana
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Rice, Daniel
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sampson, Joseph
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D480 - D489
  • [5] A probability-based approach for high-throughput protein phosphorylation analysis and site localization
    Beausoleil, Sean A.
    Villen, Judit
    Gerber, Scott A.
    Rush, John
    Gygi, Steven P.
    [J]. NATURE BIOTECHNOLOGY, 2006, 24 (10) : 1285 - 1292
  • [6] Bengio Y, 2001, ADV NEUR IN, V13, P932
  • [7] Sequence and structure-based prediction of eukaryotic protein phosphorylation sites
    Blom, N
    Gammeltoft, S
    Brunak, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1999, 294 (05) : 1351 - 1362
  • [8] ProteinBERT: a universal deep-learning model of protein sequence and function
    Brandes, Nadav
    Ofer, Dan
    Peleg, Yam
    Rappoport, Nadav
    Linial, Michal
    [J]. BIOINFORMATICS, 2022, 38 (08) : 2102 - 2110
  • [9] DTL-DephosSite: Deep Transfer Learning Based Approach to Predict Dephosphorylation Sites
    Chaudhari, Meenal
    Thapa, Niraj
    Ismail, Hamid
    Chopade, Sandhya
    Caragea, Doina
    Koehn, Maja
    Newman, Robert H.
    KC, Dukka B.
    [J]. FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, 2021, 9
  • [10] DeepRMethylSite: a deep learning based approach for prediction of arginine methylation sites in proteins
    Chaudhari, Meenal
    Thapa, Niraj
    Roy, Kaushik
    Newman, Robert H.
    Saigo, Hiroto
    Dukka, B. K. C.
    [J]. MOLECULAR OMICS, 2020, 16 (05) : 448 - 454