LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

被引:14
|
作者
Pakhrin, Subash C. [1 ,2 ]
Pokharel, Suresh [3 ]
Aoki-Kinoshita, Kiyoko F. [4 ]
Beck, Moriah R. [5 ]
Dam, Tarun K. [6 ]
Caragea, Doina [7 ]
Kc, Dukka B. [3 ]
机构
[1] Wichita State Univ, Sch Comp, 1845 Fairmount St, Wichita, KS 67260 USA
[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA
[3] Michigan Technol Univ, Coll Comp, Dept Comp Sci, Houghton, MI 49931 USA
[4] Soka Univ, Glycan & Life Syst Integrat Ctr GaLSIC, Tokyo 1928577, Japan
[5] Wichita State Univ, Dept Chem & Biochem, 1845 Fairmount St, Wichita, KS 67260 USA
[6] Kansas State Univ, Dept Chem, Lab Mechanist Glycobiol, Manhattan, KS 66506 USA
[7] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
基金
美国国家科学基金会;
关键词
deep learning; N-linked glycosylation; post-translation modification; prediction; protein language model; SEQUENCE; BACTERIAL; SETS;
D O I
10.1093/glycob/cwad033
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.
引用
收藏
页码:411 / 422
页数:12
相关论文
共 48 条
  • [21] PaleAle 6.0: Prediction of Protein Relative Solvent Accessibility by Leveraging Pre-Trained Language Models (PLMs)
    Alanazi, Wafa
    Meng, Di
    Pollastri, Gianluca
    BIOMOLECULES, 2025, 15 (01)
  • [22] PLPMpro: Enhancing promoter sequence prediction with prompt-learning based pre-trained language model
    Li, Zhongshen
    Jin, Junru
    Long, Wentao
    Wei, Leyi
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 164
  • [23] Natural language generation from Universal Dependencies using data augmentation and pre-trained language models
    Nguyen D.T.
    Tran T.
    International Journal of Intelligent Information and Database Systems, 2023, 16 (01) : 89 - 105
  • [24] Efficient Federated Learning with Pre-Trained Large Language Model Using Several Adapter Mechanisms
    Kim, Gyunyeop
    Yoo, Joon
    Kang, Sangwoo
    MATHEMATICS, 2023, 11 (21)
  • [25] Predicted N-terminal N-linked glycosylation sites may underlie membrane protein expression patterns in Saccharomyces cerevisiae
    Karki, Rashmi
    Rimal, Swechha
    Rieth, Monica D.
    YEAST, 2021, 38 (09) : 497 - 506
  • [26] Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model
    Lau, Wilson
    Lybarger, Kevin
    Gunn, Martin L.
    Yetisgen, Meliha
    JOURNAL OF DIGITAL IMAGING, 2023, 36 (01) : 91 - 104
  • [27] Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model
    Wilson Lau
    Kevin Lybarger
    Martin L. Gunn
    Meliha Yetisgen
    Journal of Digital Imaging, 2023, 36 : 91 - 104
  • [28] Mapping N-Linked Glycosylation Sites in the Secretome and Whole Cells of Aspergillus niger Using Hydrazide Chemistry and Mass Spectrometry
    Wang, Lu
    Aryal, Uma K.
    Dai, Ziyu
    Mason, Alisa C.
    Monroe, Matthew E.
    Tian, Zhi-Xin
    Zhou, Jian-Ying
    Su, Dian
    Weitz, Karl K.
    Liu, Tao
    Camp, David G., II
    Smith, Richard D.
    Baker, Scott E.
    Qian, Wei-Jun
    JOURNAL OF PROTEOME RESEARCH, 2012, 11 (01) : 143 - 156
  • [29] Identification of N-linked glycosylation sites in the spike protein and their functional impact on the replication and infectivity of coronavirus infectious bronchitis virus in cell culture
    Zheng, Jie
    Yamada, Yoshiyuki
    Fung, To Sing
    Huang, Mei
    Chia, Raymond
    Liu, Ding Xiang
    VIROLOGY, 2018, 513 : 65 - 74
  • [30] PreDBP-PLMs: Prediction of DNA-binding proteins based on pre-trained protein language models and convolutional neural networks
    Qi, Dawei
    Song, Chen
    Liu, Taigang
    ANALYTICAL BIOCHEMISTRY, 2024, 694