LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

被引:14
|
作者
Pakhrin, Subash C. [1 ,2 ]
Pokharel, Suresh [3 ]
Aoki-Kinoshita, Kiyoko F. [4 ]
Beck, Moriah R. [5 ]
Dam, Tarun K. [6 ]
Caragea, Doina [7 ]
Kc, Dukka B. [3 ]
机构
[1] Wichita State Univ, Sch Comp, 1845 Fairmount St, Wichita, KS 67260 USA
[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA
[3] Michigan Technol Univ, Coll Comp, Dept Comp Sci, Houghton, MI 49931 USA
[4] Soka Univ, Glycan & Life Syst Integrat Ctr GaLSIC, Tokyo 1928577, Japan
[5] Wichita State Univ, Dept Chem & Biochem, 1845 Fairmount St, Wichita, KS 67260 USA
[6] Kansas State Univ, Dept Chem, Lab Mechanist Glycobiol, Manhattan, KS 66506 USA
[7] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
基金
美国国家科学基金会;
关键词
deep learning; N-linked glycosylation; post-translation modification; prediction; protein language model; SEQUENCE; BACTERIAL; SETS;
D O I
10.1093/glycob/cwad033
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.
引用
收藏
页码:411 / 422
页数:12
相关论文
共 48 条
  • [41] Integrating Pre-Trained protein language model and multiple window scanning deep learning networks for accurate identification of secondary active transporters in membrane proteins
    Malik, Muhammad Shahid
    Ou, Yu-Yen
    METHODS, 2023, 220 : 11 - 20
  • [42] The role of N-linked glycosylation in determining the surface expression, G protein interaction and effector coupling of the alpha (α) isoform of the human thromboxane A2 receptor
    Kelley, LP
    Kinsella, BT
    BIOCHIMICA ET BIOPHYSICA ACTA-GENERAL SUBJECTS, 2003, 1621 (02): : 192 - 203
  • [43] N-linked glycosylation of dengue virus NS1 protein modulates secretion, cell-surface expression, hexamer stability, and interactions with human complement
    Somnuke, Pawit
    Hauhart, Richard E.
    Atkinson, John P.
    Diamond, Michael S.
    Avirutnan, Panisadee
    VIROLOGY, 2011, 413 (02) : 253 - 264
  • [44] Accurate prediction of virulence factors using pre-train protein language model and ensemble learning
    Guanghui Li
    Jian Zhou
    Jiawei Luo
    Cheng Liang
    BMC Genomics, 26 (1)
  • [45] PLM-T3SE: Accurate Prediction of Type III Secretion Effectors Using Protein Language Model Embeddings
    Gao, Mengru
    Song, Chen
    Liu, Taigang
    JOURNAL OF CELLULAR BIOCHEMISTRY, 2025, 126 (01)
  • [46] MetaAc4C: A multi-module deep learning framework for accurate prediction of N4-acetylcytidine sites based on pre-trained bidirectional encoder representation and generative adversarial networks
    Li, Zutan
    Jin, Bingbing
    Fang, Jingya
    GENOMICS, 2024, 116 (01)
  • [47] Novel "extended sequons" of human N-glycosylation sites improve the precision of qualitative predictions: an alignment-free study of pattern recognition using ProtDCal protein features
    Ruiz-Blanco, Yasser B.
    Marrero-Ponce, Yovani
    Garcia-Hernandez, Enrique
    Green, James
    AMINO ACIDS, 2017, 49 (02) : 317 - 325
  • [48] DeepNeoAG: Neoantigen epitope prediction from melanoma antigens using a synergistic deep learning model combining protein language models and multi-window scanning convolutional neural networks
    Chuang, Cheng-Che
    Liu, Yu-Chen
    Ou, Yu-Yen
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 281