DeepKhib: A Deep-Learning Framework for Lysine 2-Hydroxyisobutyrylation Sites Prediction

被引:15
作者
Zhang, Luna [1 ]
Zou, Yang [2 ]
He, Ningning [2 ]
Chen, Yu [1 ]
Chen, Zhen [3 ,4 ]
Li, Lei [1 ,2 ]
机构
[1] Qingdao Univ, Sch Data Sci & Software Engn, Qingdao, Peoples R China
[2] Qingdao Univ, Sch Basic Med, Qingdao, Peoples R China
[3] Henan Agr Univ, Collaborat Innovat Ctr Henan Grain Crops, Zhengzhou, Peoples R China
[4] Henan Agr Univ, Key Lab Rice Biol Henan Prov, Zhengzhou, Peoples R China
来源
FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY | 2020年 / 8卷
基金
中国国家自然科学基金;
关键词
post-translational modification; lysine; 2-hydroxyisobutyrylation; deep learning; modification site prediction; machine learning; CD-HIT; CROTONYLATION; SELECTION; PROTEIN; SETS;
D O I
10.3389/fcell.2020.580217
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
As a novel type of post-translational modification, lysine 2-Hydroxyisobutyrylation (K-hib) plays an important role in gene transcription and signal transduction. In order to understand its regulatory mechanism, the essential step is the recognition of K(hib)sites. Thousands of K(hib)sites have been experimentally verified across five different species. However, there are only a couple traditional machine-learning algorithms developed to predict K(hib)sites for limited species, lacking a general prediction algorithm. We constructed a deep-learning algorithm based on convolutional neural network with the one-hot encoding approach, dubbed CNNOH. It performs favorably to the traditional machine-learning models and other deep-learning models across different species, in terms of cross-validation and independent test. The area under the ROC curve (AUC) values for CNN(OH)ranged from 0.82 to 0.87 for different organisms, which is superior to the currently available K(hib)predictors. Moreover, we developed the general model based on the integrated data from multiple species and it showed great universality and effectiveness with the AUC values in the range of 0.79-0.87. Accordingly, we constructed the on-line prediction tool dubbed DeepKhib for easily identifying K(hib)sites, which includes both species-specific and general models. DeepKhib is available at.
引用
收藏
页数:11
相关论文
共 37 条
[1]  
[Anonymous], 2017, SCI REP UK
[2]   Evolution and functional cross-talk of protein post-translational modifications [J].
Beltrao, Pedro ;
Bork, Peer ;
Krogan, Nevan J. ;
van Noort, Vera .
MOLECULAR SYSTEMS BIOLOGY, 2013, 9
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   SUMOhydro: A Novel Method for the Prediction of Sumoylation Sites Based on Hydrophobic Properties [J].
Chen, Yong-Zi ;
Chen, Zhen ;
Gong, Yu-Ai ;
Ying, Guoguang .
PLOS ONE, 2012, 7 (06)
[5]   iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data [J].
Chen, Zhen ;
Zhao, Pei ;
Li, Fuyi ;
Marquez-Lago, Tatiana T. ;
Leier, Andre ;
Revote, Jerico ;
Zhu, Yan ;
Powell, David R. ;
Akutsu, Tatsuya ;
Webb, Geoffrey, I ;
Chou, Kuo-Chen ;
Smith, A. Ian ;
Daly, Roger J. ;
Li, Jian ;
Song, Jiangning .
BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) :1047-1057
[6]   Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites [J].
Chen, Zhen ;
He, Ningning ;
Huang, Yu ;
Qin, Wen Tao ;
Liu, Xuhan ;
Li, Lei .
GENOMICS PROTEOMICS & BIOINFORMATICS, 2018, 16 (06) :451-459
[7]   Large-scale comparative assessment of computational predictors for lysine post-translational modification sites [J].
Chen, Zhen ;
Liu, Xuhan ;
Li, Fuyi ;
Li, Chen ;
Marquez-Lago, Tatiana ;
Leier, Andre ;
Akutsu, Tatsuya ;
Webb, Geoffrey, I ;
Xu, Dakang ;
Smith, Alexander Ian ;
Li, Lei ;
Chou, Kuo-Chen ;
Song, Jiangning .
BRIEFINGS IN BIOINFORMATICS, 2019, 20 (06) :2267-2290
[8]   iFeature: a Python']Python package and web server for features extraction and selection from protein and peptide sequences [J].
Chen, Zhen ;
Zhao, Pei ;
Li, Fuyi ;
Leier, Andre ;
Marquez-Lago, Tatiana T. ;
Wang, Yanan ;
Webb, Geoffrey I. ;
Smith, A. Ian ;
Daly, Roger J. ;
Chou, Kuo-Chen ;
Song, Jiangning .
BIOINFORMATICS, 2018, 34 (14) :2499-2502
[9]  
CHO K., 2014, LEARNING PHRASE REPR, DOI DOI 10.3115/V1/D14-1179
[10]  
Dai LZ, 2014, NAT CHEM BIOL, V10, P365, DOI [10.1038/nchembio.1497, 10.1038/NCHEMBIO.1497]