PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization

被引:54
作者
Yu, Jialin [1 ,2 ]
Shi, Shaoping [1 ,2 ]
Zhang, Fang [1 ,2 ]
Chen, Guodong [1 ,2 ]
Cao, Man [1 ,2 ]
机构
[1] Nanchang Univ, Sch Sci, Dept Math, Nanchang 330031, Jiangxi, Peoples R China
[2] Nanchang Univ, Sch Sci, Numer Simulat & High Performance Comp Lab, Nanchang 330031, Jiangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
END-PRODUCTS; LOCALIZATION; METHYLATION; SEQUENCE; DISEASE;
D O I
10.1093/bioinformatics/bty1043
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein glycation is a familiar post-translational modification (PTM) which is a two-step non-enzymatic reaction. Glycation not only impairs the function but also changes the characteristics of the proteins so that it is related to many human diseases. It is still much more difficult to systematically detect glycation sites due to the glycated residues without crucial patterns. Computational approaches, which can filter supposed sites prior to experimental verification, can extremely increase the efficiency of experiment work. However, the previous lysine glycation prediction method uses a small number of training datasets. Hence, the model is not generalized or pervasive. Results: By searching from a new database, we collected a large dataset in Homo sapiens. PredGly, a novel software, can predict lysine glycation sites for H.sapiens, which was developed by combining multiple features. In addition, XGboost was adopted to optimize feature vectors and to improve the model performance. Through comparing various classifiers, support vector machine achieved an optimal performance. On the basis of a new independent test set, PredGly outperformed other glycation tools. It suggests that PredGly can provide more instructive guidance for further experimental research of lysine glycation.
引用
收藏
页码:2749 / 2756
页数:8
相关论文
共 37 条
[1]  
[Anonymous], 2016, KDD16 P 22 ACM, DOI DOI 10.1145/2939672.2939785
[2]   Computational Prediction and Analysis for Tyrosine Post Translational Modifications via Elastic Net [J].
Cao, Man ;
Chen, Guodong ;
Wang, Lina ;
Wen, Pingping ;
Shi, Shaoping .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (06) :1272-1281
[3]   ProAcePred: prokaryote lysine acetylation sites prediction based on elastic net feature optimization [J].
Chen, Guodong ;
Cao, Man ;
Luo, Kun ;
Wang, Lina ;
Wen, Pingping ;
Shi, Shaoping .
BIOINFORMATICS, 2018, 34 (23) :3999-4006
[4]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[5]   Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites [J].
Gao, Jianjiong ;
Thelen, Jay J. ;
Dunker, A. Keith ;
Xu, Dong .
MOLECULAR & CELLULAR PROTEOMICS, 2010, 9 (12) :2586-2600
[6]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[7]   CD-HIT Suite: a web server for clustering and comparing biological sequences [J].
Huang, Ying ;
Niu, Beifang ;
Gao, Ying ;
Fu, Limin ;
Li, Weizhong .
BIOINFORMATICS, 2010, 26 (05) :680-682
[8]   iProtGly-SS: Identifying protein glycation sites using sequence and structure based features [J].
Islam, Md Mofijul ;
Saha, Sanjay ;
Rahman, Md Mahmudur ;
Shatabda, Swakkhar ;
Farid, Dewan Md ;
Dehzangi, Abdollah .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 (07) :777-789
[9]   O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique [J].
Jia, Cangzhi ;
Zuo, Yun ;
Zou, Quan .
BIOINFORMATICS, 2018, 34 (12) :2029-2036
[10]   Analysis and prediction of mammalian protein glycation [J].
Johansen, Morten Bo ;
Kiemer, Lars ;
Brunak, Soren .
GLYCOBIOLOGY, 2006, 16 (09) :844-853