O-GlcNAcPRED-DL: Prediction of Protein O-GlcNAcylation Sites Based on an Ensemble Model of Deep Learning

被引:12
作者
Hu, Fengzhu [1 ]
Li, Weiyu [2 ]
Li, Yaoxiang [2 ]
Hou, Chunyan [2 ]
Ma, Junfeng [2 ]
Jia, Cangzhi [1 ]
机构
[1] Dalian Maritime Univ, Sch Sci, Dalian 116026, Peoples R China
[2] Georgetown Univ, Med Ctr, Lombardi Comprehens Canc Ctr, Washington, DC 20007 USA
基金
中国国家自然科学基金;
关键词
O-GlcNAc; O-GlcNAcylation; deep learning; convolutional neural network (CNN); bidirectional longshort-term memory (BiLSTM); sequence analysis; CONVOLUTIONAL NEURAL-NETWORKS; GLYCOSYLATION;
D O I
10.1021/acs.jproteome.3c00458
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
O-linked beta-N-acetylglucosamine (O-GlcNAc) is a post-translational modification (i.e., O-GlcNAcylation) on serine/threonine residues of proteins, regulating a plethora of physiological and pathological events. As a dynamic process, O-GlcNAc functions in a site-specific manner. However, the experimental identification of the O-GlcNAc sites remains challenging in many scenarios. Herein, by leveraging the recent progress in cataloguing experimentally identified O-GlcNAc sites and advanced deep learning approaches, we establish an ensemble model, O-GlcNAcPRED-DL, a deep learning-based tool, for the prediction of O-GlcNAc sites. In brief, to make a benchmark O-GlcNAc data set, we extracted the information on O-GlcNAc from the recently constructed database O-GlcNAcAtlas, which contains thousands of experimentally identified and curated O-GlcNAc sites on proteins from multiple species. To overcome the imbalance between positive and negative data sets, we selected five groups of negative data sets in humans and mice to construct an ensemble predictor based on connection of a convolutional neural network and bidirectional long short-term memory. By taking into account three types of sequence information, we constructed four network frameworks, with the systematically optimized parameters used for the models. The thorough comparison analysis on two independent data sets of humans and mice and six independent data sets from other species demonstrated remarkably increased sensitivity and accuracy of the O-GlcNAcPRED-DL models, outperforming other existing tools. Moreover, a user-friendly Web server for O-GlcNAcPRED-DL has been constructed, which is freely available at http://oglcnac.org/pred_dl.
引用
收藏
页码:95 / 106
页数:12
相关论文
共 45 条
[1]   O-GlcNAc Cycling: A Link Between Metabolism and Chronic Disease [J].
Bond, Michelle R. ;
Hanover, John A. .
ANNUAL REVIEW OF NUTRITION, VOL 33, 2013, 33 :205-229
[2]   COPPER: an ensemble deep-learning approach for identifying exclusive virus-derived small interfering RNAs in plants [J].
Bu, Yuanyuan ;
Jia, Cangzhi ;
Guo, Xudong ;
Li, Fuyi ;
Song, Jiangning .
BRIEFINGS IN FUNCTIONAL GENOMICS, 2023, 22 (03) :274-280
[3]   ROLE OF O-LINKED N-ACETYLGLUCOSAMINE PROTEINMODIFICATION IN CELLULAR (PATHO) PHYSIOLOGY [J].
Chatham, John C. ;
Zhang, Jianhua ;
Wende, Adam R. .
PHYSIOLOGICAL REVIEWS, 2021, 101 (02) :427-493
[4]   nhKcr: a new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning [J].
Chen, Yong-Zi ;
Wang, Zhuo-Zhi ;
Wang, Yanan ;
Ying, Guoguang ;
Chen, Zhen ;
Song, Jiangning .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
[5]   Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites [J].
Chen, Zhen ;
He, Ningning ;
Huang, Yu ;
Qin, Wen Tao ;
Liu, Xuhan ;
Li, Lei .
GENOMICS PROTEOMICS & BIOINFORMATICS, 2018, 16 (06) :451-459
[6]   Exploration of O-GlcNAc transferase glycosylation sites reveals a target sequence compositional bias [J].
Chong, P. Andrew ;
Nosella, Michael L. ;
Vanama, Manasvi ;
Ruiz-Arduengo, Roxana ;
Forman-Kay, Julie D. .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2023, 299 (05)
[7]   Regulation of Calcium/Calmodulin-dependent Kinase IV by O-GlcNAc Modification [J].
Dias, Wagner B. ;
Cheung, Win D. ;
Wang, Zihao ;
Hart, Gerald W. .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2009, 284 (32) :21327-21337
[8]   Where did the BLOSUM62 alignment score matrix come from? [J].
Eddy, SR .
NATURE BIOTECHNOLOGY, 2004, 22 (08) :1035-1036
[9]   LSTM: A Search Space Odyssey [J].
Greff, Klaus ;
Srivastava, Rupesh K. ;
Koutnik, Jan ;
Steunebrink, Bas R. ;
Schmidhuber, Juergen .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (10) :2222-2232
[10]   Molecular Cavity Topological Representation for Pattern Analysis: A NLP Analogy-Based Word2Vec Method [J].
Guo, Dongliang ;
Wang, Qiaoqiao ;
Liang, Meng ;
Liu, Wei ;
Nie, Junlan .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2019, 20 (23)