LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model

被引:15
作者
Pakhrin, Subash C. [1 ,2 ]
Pokharel, Suresh [3 ]
Pratyush, Pawel [3 ]
Chaudhari, Meenal [4 ]
Ismail, Hamid D. [3 ]
Dukka, B. K. C. B. [3 ]
机构
[1] Wichita State Univ, Sch Comp, Wichita, KS 67260 USA
[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA
[3] Michigan Technol Univ, Dept Comp Sci, Houghton, MI 49931 USA
[4] North Carolina A&T State Univ, Dept Biol, Greensboro, NC 27411 USA
基金
美国国家科学基金会;
关键词
post-translational modification; protein language model; phosphorylation; deep learning; stack generalization; score-level fusion; embedding; RESOURCE; ASSOCIATION; DATABASE;
D O I
10.1021/acs.jproteome.2c00667
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Phosphorylation is one of the most important post-translationalmodifications and plays a pivotal role in various cellular processes.Although there exist several computational tools to predict phosphorylationsites, existing tools have not yet harnessed the knowledge distilledby pretrained protein language models. Herein, we present a noveldeep learning-based approach called LMPhosSite for the general phosphorylationsite prediction that integrates embeddings from the local window sequenceand the contextualized embedding obtained using global (overall) proteinsequence from a pretrained protein language model to improve the predictionperformance. Thus, the LMPhosSite consists of two base-models: onefor capturing effective local representation and the other for capturingglobal per-residue contextualized embedding from a pretrained proteinlanguage model. The output of these base-models is integrated usinga score-level fusion approach. LMPhosSite achieves a precision, recall,Matthew's correlation coefficient, and F1-score of 38.78%, 67.12%,0.390, and 49.15%, for the combined serine and threonine independenttest data set and 34.90%, 62.03%, 0.298, and 44.67%, respectively,for the tyrosine independent test data set, which is better than thecompared approaches. These results demonstrate that LMPhosSite isa robust computational tool for the prediction of the general phosphorylationsites in proteins.
引用
收藏
页码:2548 / 2557
页数:10
相关论文
共 58 条
[1]   FELINE GASTRIN . AN EXAMPLE OF PEPTIDE SEQUENCE ANALYSIS BY MASS SPECTROMETRY [J].
AGARWAL, KL ;
KENNER, GW ;
SHEPPARD, RC .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1969, 91 (11) :3096-&
[2]   32P LABELING OF PROTEIN PHOSPHORYLATION AND METABOLITE ASSOCIATION IN THE MITOCHONDRIA MATRIX [J].
Aponte, Angel M. ;
Phillips, Darci ;
Harris, Robert A. ;
Blinova, Ksenia ;
French, Stephanie ;
Johnson, D. Thor ;
Balaban, Robert S. .
METHODS IN ENZYMOLOGY, VOL 457: MITOCHONDRIAL FUNCTION, PARTB MITOCHONDRIAL PROTEIN KINASES, PROTEIN PHOSPHATASES AND MITOCHONDRIAL DISEASES, 2009, 457 :63-80
[3]   Activities at the Universal Protein Resource (UniProt) [J].
Apweiler, Rolf ;
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Casanova, Elisabet Barrera ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chan, Wei Mun ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Gane, Paul ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightingale, Andrew ;
Orchard, Sandra ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier ;
Zellner, Hermann ;
Corbett, Matt .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D191-D198
[4]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[5]   A probability-based approach for high-throughput protein phosphorylation analysis and site localization [J].
Beausoleil, Sean A. ;
Villen, Judit ;
Gerber, Scott A. ;
Rush, John ;
Gygi, Steven P. .
NATURE BIOTECHNOLOGY, 2006, 24 (10) :1285-1292
[6]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[7]   Sequence and structure-based prediction of eukaryotic protein phosphorylation sites [J].
Blom, N ;
Gammeltoft, S ;
Brunak, S .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 294 (05) :1351-1362
[8]   ProteinBERT: a universal deep-learning model of protein sequence and function [J].
Brandes, Nadav ;
Ofer, Dan ;
Peleg, Yam ;
Rappoport, Nadav ;
Linial, Michal .
BIOINFORMATICS, 2022, 38 (08) :2102-2110
[9]   DTL-DephosSite: Deep Transfer Learning Based Approach to Predict Dephosphorylation Sites [J].
Chaudhari, Meenal ;
Thapa, Niraj ;
Ismail, Hamid ;
Chopade, Sandhya ;
Caragea, Doina ;
Koehn, Maja ;
Newman, Robert H. ;
KC, Dukka B. .
FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, 2021, 9
[10]   DeepRMethylSite: a deep learning based approach for prediction of arginine methylation sites in proteins [J].
Chaudhari, Meenal ;
Thapa, Niraj ;
Roy, Kaushik ;
Newman, Robert H. ;
Saigo, Hiroto ;
Dukka, B. K. C. .
MOLECULAR OMICS, 2020, 16 (05) :448-454