O-GlcNAcPRED-DL: Prediction of Protein O-GlcNAcylation Sites Based on an Ensemble Model of Deep Learning

被引:12
作者
Hu, Fengzhu [1 ]
Li, Weiyu [2 ]
Li, Yaoxiang [2 ]
Hou, Chunyan [2 ]
Ma, Junfeng [2 ]
Jia, Cangzhi [1 ]
机构
[1] Dalian Maritime Univ, Sch Sci, Dalian 116026, Peoples R China
[2] Georgetown Univ, Med Ctr, Lombardi Comprehens Canc Ctr, Washington, DC 20007 USA
基金
中国国家自然科学基金;
关键词
O-GlcNAc; O-GlcNAcylation; deep learning; convolutional neural network (CNN); bidirectional longshort-term memory (BiLSTM); sequence analysis; CONVOLUTIONAL NEURAL-NETWORKS; GLYCOSYLATION;
D O I
10.1021/acs.jproteome.3c00458
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
O-linked beta-N-acetylglucosamine (O-GlcNAc) is a post-translational modification (i.e., O-GlcNAcylation) on serine/threonine residues of proteins, regulating a plethora of physiological and pathological events. As a dynamic process, O-GlcNAc functions in a site-specific manner. However, the experimental identification of the O-GlcNAc sites remains challenging in many scenarios. Herein, by leveraging the recent progress in cataloguing experimentally identified O-GlcNAc sites and advanced deep learning approaches, we establish an ensemble model, O-GlcNAcPRED-DL, a deep learning-based tool, for the prediction of O-GlcNAc sites. In brief, to make a benchmark O-GlcNAc data set, we extracted the information on O-GlcNAc from the recently constructed database O-GlcNAcAtlas, which contains thousands of experimentally identified and curated O-GlcNAc sites on proteins from multiple species. To overcome the imbalance between positive and negative data sets, we selected five groups of negative data sets in humans and mice to construct an ensemble predictor based on connection of a convolutional neural network and bidirectional long short-term memory. By taking into account three types of sequence information, we constructed four network frameworks, with the systematically optimized parameters used for the models. The thorough comparison analysis on two independent data sets of humans and mice and six independent data sets from other species demonstrated remarkably increased sensitivity and accuracy of the O-GlcNAcPRED-DL models, outperforming other existing tools. Moreover, a user-friendly Web server for O-GlcNAcPRED-DL has been constructed, which is freely available at http://oglcnac.org/pred_dl.
引用
收藏
页码:95 / 106
页数:12
相关论文
共 45 条
[41]   Prediction of protein-protein interactions by label propagation with protein evolutionary and chemical information derived from heterogeneous network [J].
Wen, Yu-Ting ;
Lei, Hai-Jun ;
You, Zhu-Hong ;
Lei, Bai-Ying ;
Chen, Xing ;
Li, Li-Ping .
JOURNAL OF THEORETICAL BIOLOGY, 2017, 430 :9-20
[42]   Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning [J].
Xu, Haodong ;
Jia, Peilin ;
Zhao, Zhongming .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
[43]   Phosphofructokinase 1 Glycosylation Regulates Cell Growth and Metabolism [J].
Yi, Wen ;
Clark, Peter M. ;
Mason, Daniel E. ;
Keenan, Marie C. ;
Hill, Collin ;
Goddard, William A., III ;
Peters, Eric C. ;
Driggers, Edward M. ;
Hsieh-Wilson, Linda C. .
SCIENCE, 2012, 337 (6097) :975-980
[44]   PG1cS: Prediction of protein O-GlcNAcylation sites with multiple features and analysis [J].
Zhao, Xiaowei ;
Ning, Qiao ;
Chai, Haiting ;
Ai, Meiyue ;
Ma, Zhiqiang .
JOURNAL OF THEORETICAL BIOLOGY, 2015, 380 :524-529
[45]   Deep distributed convolutional neural networks: Universality [J].
Zhou, Ding-Xuan .
ANALYSIS AND APPLICATIONS, 2018, 16 (06) :895-919