LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

被引：14

作者：

Pakhrin, Subash C. ^{[1
,2
]}

Pokharel, Suresh ^{[3
]}

Aoki-Kinoshita, Kiyoko F. ^{[4
]}

Beck, Moriah R. ^{[5
]}

Dam, Tarun K. ^{[6
]}

Caragea, Doina ^{[7
]}

Kc, Dukka B. ^{[3
]}

机构：

[1] Wichita State Univ, Sch Comp, 1845 Fairmount St, Wichita, KS 67260 USA

[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA

[3] Michigan Technol Univ, Coll Comp, Dept Comp Sci, Houghton, MI 49931 USA

[4] Soka Univ, Glycan & Life Syst Integrat Ctr GaLSIC, Tokyo 1928577, Japan

[5] Wichita State Univ, Dept Chem & Biochem, 1845 Fairmount St, Wichita, KS 67260 USA

[6] Kansas State Univ, Dept Chem, Lab Mechanist Glycobiol, Manhattan, KS 66506 USA

[7] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA

来源：

GLYCOBIOLOGY | 2023年 / 33卷 / 05期

基金：

美国国家科学基金会;

关键词：

deep learning; N-linked glycosylation; post-translation modification; prediction; protein language model; SEQUENCE; BACTERIAL; SETS;

D O I：

10.1093/glycob/cwad033

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.

引用

页码：411 / 422

页数：12

共 48 条

[21] PaleAle 6.0: Prediction of Protein Relative Solvent Accessibility by Leveraging Pre-Trained Language Models (PLMs)
Alanazi, Wafa
Meng, Di
Pollastri, Gianluca
BIOMOLECULES, 2025, 15 (01)
[22] PLPMpro: Enhancing promoter sequence prediction with prompt-learning based pre-trained language model
Li, Zhongshen
Jin, Junru
Long, Wentao
Wei, Leyi
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 164
[23] Natural language generation from Universal Dependencies using data augmentation and pre-trained language models
Nguyen D.T.
Tran T.
International Journal of Intelligent Information and Database Systems, 2023, 16 (01) : 89 - 105
[24] Efficient Federated Learning with Pre-Trained Large Language Model Using Several Adapter Mechanisms
Kim, Gyunyeop
Yoo, Joon
Kang, Sangwoo
MATHEMATICS, 2023, 11 (21)
[25] Predicted N-terminal N-linked glycosylation sites may underlie membrane protein expression patterns in Saccharomyces cerevisiae
Karki, Rashmi
Rimal, Swechha
Rieth, Monica D.
YEAST, 2021, 38 (09) : 497 - 506
[26] Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model
Lau, Wilson
Lybarger, Kevin
Gunn, Martin L.
Yetisgen, Meliha
JOURNAL OF DIGITAL IMAGING, 2023, 36 (01) : 91 - 104
[27] Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model
Wilson Lau
Kevin Lybarger
Martin L. Gunn
Meliha Yetisgen
Journal of Digital Imaging, 2023, 36 : 91 - 104
[28] Mapping N-Linked Glycosylation Sites in the Secretome and Whole Cells of Aspergillus niger Using Hydrazide Chemistry and Mass Spectrometry
Wang, Lu
Aryal, Uma K.
Dai, Ziyu
Mason, Alisa C.
Monroe, Matthew E.
Tian, Zhi-Xin
Zhou, Jian-Ying
Su, Dian
Weitz, Karl K.
Liu, Tao
Camp, David G., II
Smith, Richard D.
Baker, Scott E.
Qian, Wei-Jun
JOURNAL OF PROTEOME RESEARCH, 2012, 11 (01) : 143 - 156
[29] Identification of N-linked glycosylation sites in the spike protein and their functional impact on the replication and infectivity of coronavirus infectious bronchitis virus in cell culture
Zheng, Jie
Yamada, Yoshiyuki
Fung, To Sing
Huang, Mei
Chia, Raymond
Liu, Ding Xiang
VIROLOGY, 2018, 513 : 65 - 74
[30] PreDBP-PLMs: Prediction of DNA-binding proteins based on pre-trained protein language models and convolutional neural networks
Qi, Dawei
Song, Chen
Liu, Taigang
ANALYTICAL BIOCHEMISTRY, 2024, 694

← 1 2 3 4 5 →