LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

被引：14

作者：

Pakhrin, Subash C. ^{[1
,2
]}

Pokharel, Suresh ^{[3
]}

Aoki-Kinoshita, Kiyoko F. ^{[4
]}

Beck, Moriah R. ^{[5
]}

Dam, Tarun K. ^{[6
]}

Caragea, Doina ^{[7
]}

Kc, Dukka B. ^{[3
]}

机构：

[1] Wichita State Univ, Sch Comp, 1845 Fairmount St, Wichita, KS 67260 USA

[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA

[3] Michigan Technol Univ, Coll Comp, Dept Comp Sci, Houghton, MI 49931 USA

[4] Soka Univ, Glycan & Life Syst Integrat Ctr GaLSIC, Tokyo 1928577, Japan

[5] Wichita State Univ, Dept Chem & Biochem, 1845 Fairmount St, Wichita, KS 67260 USA

[6] Kansas State Univ, Dept Chem, Lab Mechanist Glycobiol, Manhattan, KS 66506 USA

[7] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA

来源：

GLYCOBIOLOGY | 2023年 / 33卷 / 05期

基金：

美国国家科学基金会;

关键词：

deep learning; N-linked glycosylation; post-translation modification; prediction; protein language model; SEQUENCE; BACTERIAL; SETS;

D O I：

10.1093/glycob/cwad033

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.

引用

页码：411 / 422

页数：12

共 48 条

[1] An analytical study on the identification of N-linked glycosylation sites using machine learning model
Akmal, Muhammad Aizaz
Hassan, Muhammad Awais
Shoaib, Muhammad
Khurshid, Khaldoon S.
Mohamed, Abdullah
PEERJ COMPUTER SCIENCE, 2022, 8
[2] DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction
Pakhrin, Subash C.
Aoki-Kinoshita, Kiyoko F.
Caragea, Doina
Dukka, B. K. C.
MOLECULES, 2021, 26 (23):
[3] Identification of N-linked glycosylation sites in human nephrin using mass spectrometry
Khoshnoodi, Jamshid
Hill, Salisha
Tryggvason, Karl
Hudson, Billy
Friedman, David B.
JOURNAL OF MASS SPECTROMETRY, 2007, 42 (03): : 370 - 379
[4] LPBERT: A Protein-Protein Interaction Prediction Method Based on a Pre-Trained Language Model
Hu, An
Kuang, Linai
Yang, Dinghai
APPLIED SCIENCES-BASEL, 2025, 15 (06):
[5] PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models
Zhang, Lingrong
Liu, Taigang
INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 281
[6] Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning
Wang, Jue
Liu, Yufan
Tian, Boxue
JOURNAL OF CHEMINFORMATICS, 2024, 16 (01):
[7] PreAlgPro: Prediction of allergenic proteins with pre-trained protein language model and efficient neutral network
Zhang, Lingrong
Liu, Taigang
INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 280
[8] Interpretable Prediction of SARS-CoV-2 Epitope-Specific TCR Recognition Using a Pre-Trained Protein Language Model
Yoo, Sunyong
Jeong, Myeonghyeon
Seomun, Subhin
Kim, Kiseong
Han, Youngmahn
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (03) : 428 - 438
[9] POOE: predicting oomycete effectors based on a pre-trained large protein language model
Zhao, Miao
Lei, Chenping
Zhou, Kewei
Huang, Yan
Fu, Chen
Yang, Shiping
Zhang, Ziding
MSYSTEMS, 2024, 9 (01)
[10] Using ensemble SVM to identify human GPCRs N-linked glycosylation sites based on the general form of Chous PseAAC
Xie, Hua-Lin
Fu, Liang
Nie, Xi-Du
PROTEIN ENGINEERING DESIGN & SELECTION, 2013, 26 (11) : 735 - 742

← 1 2 3 4 5 →