LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

被引:14
|
作者
Pakhrin, Subash C. [1 ,2 ]
Pokharel, Suresh [3 ]
Aoki-Kinoshita, Kiyoko F. [4 ]
Beck, Moriah R. [5 ]
Dam, Tarun K. [6 ]
Caragea, Doina [7 ]
Kc, Dukka B. [3 ]
机构
[1] Wichita State Univ, Sch Comp, 1845 Fairmount St, Wichita, KS 67260 USA
[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA
[3] Michigan Technol Univ, Coll Comp, Dept Comp Sci, Houghton, MI 49931 USA
[4] Soka Univ, Glycan & Life Syst Integrat Ctr GaLSIC, Tokyo 1928577, Japan
[5] Wichita State Univ, Dept Chem & Biochem, 1845 Fairmount St, Wichita, KS 67260 USA
[6] Kansas State Univ, Dept Chem, Lab Mechanist Glycobiol, Manhattan, KS 66506 USA
[7] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
基金
美国国家科学基金会;
关键词
deep learning; N-linked glycosylation; post-translation modification; prediction; protein language model; SEQUENCE; BACTERIAL; SETS;
D O I
10.1093/glycob/cwad033
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.
引用
收藏
页码:411 / 422
页数:12
相关论文
共 48 条
  • [31] Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
    Richter-Pechanski, Phillip
    Geis, Nicolas A.
    Kiriakou, Christina
    Schwab, Dominic M.
    Dieterich, Christoph
    DIGITAL HEALTH, 2021, 7
  • [32] Recent advances in features generation for membrane protein sequences: From multiple sequence alignment to pre-trained language models
    Ou, Yu-Yen
    Ho, Quang-Thai
    Chang, Heng-Ta
    PROTEOMICS, 2023, 23 (23-24)
  • [33] Transfer-DDG: Prediction of protein-protein binding affinity changes with mutations based on large pre-trained model transfer learning
    Wang, Yuxiang
    Shi, Xiumin
    Zhou, Han
    2023 IEEE 2ND INDUSTRIAL ELECTRONICS SOCIETY ANNUAL ON-LINE CONFERENCE, ONCON, 2023,
  • [34] CasPro-ESM2: Accurate identification of Cas proteins integrating pre-trained protein language model and multi-scale convolutional neural network
    Yan, Chaorui
    Zhang, Zilong
    Xu, Junlin
    Meng, Yajie
    Yan, Shankai
    Wei, Leyi
    Zou, Quan
    Zhang, Qingchen
    Cui, Feifei
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2025, 308
  • [35] Human Monkeypox Classification from Skin Lesion Images with Deep Pre-trained Network using Mobile Application
    Veysel Harun Sahin
    Ismail Oztel
    Gozde Yolcu Oztel
    Journal of Medical Systems, 46
  • [36] VesiMCNN: Using pre-trained protein language models and multiple window scanning convolutional neural networks to identify vesicular transport proteins
    Le, Van The
    Tseng, Yi-Hsuan
    Liu, Yu-Chen
    Malik, Muhammad Shahid
    Ou, Yu-Yen
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 280
  • [37] Human Monkeypox Classification from Skin Lesion Images with Deep Pre-trained Network using Mobile Application
    Sahin, Veysel Harun
    Oztel, Ismail
    Yolcu Oztel, Gozde
    JOURNAL OF MEDICAL SYSTEMS, 2022, 46 (11)
  • [38] Exploiting pglB Oligosaccharyltransferase-Positive and -Negative Campylobacter jejuni and a Multiprotease Digestion Strategy to Identify Novel Sites Modified by N-Linked Protein Glycosylation
    Cain, Joel A.
    Dale, Ashleigh L.
    Cordwell, Stuart J.
    JOURNAL OF PROTEOME RESEARCH, 2021, 20 (11) : 4995 - 5009
  • [39] LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model
    Pakhrin, Subash C.
    Pokharel, Suresh
    Pratyush, Pawel
    Chaudhari, Meenal
    Ismail, Hamid D.
    Dukka, B. K. C. B.
    JOURNAL OF PROTEOME RESEARCH, 2023, 22 (08) : 2548 - 2557
  • [40] Identification of Important N-Linked Glycosylation Sites in the Hemagglutinin Protein and Their Functional Impact on DC-SIGN Mediated Avian Influenza H5N1 Infection
    Yang, Zih-Syuan
    Huang, Szu-Wei
    Wang, Wen-Hung
    Lin, Chih-Yen
    Wang, Chu-Feng
    Urbina, Aspiro Nayim
    Thitithanyanont, Arunee
    Tseng, Sung-Pin
    Lu, Po-Liang
    Chen, Yen-Hsu
    Wang, Sheng-Fan
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (02) : 1 - 22