PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization

被引:54
作者
Yu, Jialin [1 ,2 ]
Shi, Shaoping [1 ,2 ]
Zhang, Fang [1 ,2 ]
Chen, Guodong [1 ,2 ]
Cao, Man [1 ,2 ]
机构
[1] Nanchang Univ, Sch Sci, Dept Math, Nanchang 330031, Jiangxi, Peoples R China
[2] Nanchang Univ, Sch Sci, Numer Simulat & High Performance Comp Lab, Nanchang 330031, Jiangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
END-PRODUCTS; LOCALIZATION; METHYLATION; SEQUENCE; DISEASE;
D O I
10.1093/bioinformatics/bty1043
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein glycation is a familiar post-translational modification (PTM) which is a two-step non-enzymatic reaction. Glycation not only impairs the function but also changes the characteristics of the proteins so that it is related to many human diseases. It is still much more difficult to systematically detect glycation sites due to the glycated residues without crucial patterns. Computational approaches, which can filter supposed sites prior to experimental verification, can extremely increase the efficiency of experiment work. However, the previous lysine glycation prediction method uses a small number of training datasets. Hence, the model is not generalized or pervasive. Results: By searching from a new database, we collected a large dataset in Homo sapiens. PredGly, a novel software, can predict lysine glycation sites for H.sapiens, which was developed by combining multiple features. In addition, XGboost was adopted to optimize feature vectors and to improve the model performance. Through comparing various classifiers, support vector machine achieved an optimal performance. On the basis of a new independent test set, PredGly outperformed other glycation tools. It suggests that PredGly can provide more instructive guidance for further experimental research of lysine glycation.
引用
收藏
页码:2749 / 2756
页数:8
相关论文
共 37 条
  • [1] [Anonymous], 2016, KDD16 P 22 ACM, DOI DOI 10.1145/2939672.2939785
  • [2] Computational Prediction and Analysis for Tyrosine Post Translational Modifications via Elastic Net
    Cao, Man
    Chen, Guodong
    Wang, Lina
    Wen, Pingping
    Shi, Shaoping
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (06) : 1272 - 1281
  • [3] ProAcePred: prokaryote lysine acetylation sites prediction based on elastic net feature optimization
    Chen, Guodong
    Cao, Man
    Luo, Kun
    Wang, Lina
    Wen, Pingping
    Shi, Shaoping
    [J]. BIOINFORMATICS, 2018, 34 (23) : 3999 - 4006
  • [4] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232
  • [5] Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites
    Gao, Jianjiong
    Thelen, Jay J.
    Dunker, A. Keith
    Xu, Dong
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2010, 9 (12) : 2586 - 2600
  • [6] AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS
    HENIKOFF, S
    HENIKOFF, JG
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) : 10915 - 10919
  • [7] CD-HIT Suite: a web server for clustering and comparing biological sequences
    Huang, Ying
    Niu, Beifang
    Gao, Ying
    Fu, Limin
    Li, Weizhong
    [J]. BIOINFORMATICS, 2010, 26 (05) : 680 - 682
  • [8] iProtGly-SS: Identifying protein glycation sites using sequence and structure based features
    Islam, Md Mofijul
    Saha, Sanjay
    Rahman, Md Mahmudur
    Shatabda, Swakkhar
    Farid, Dewan Md
    Dehzangi, Abdollah
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 (07) : 777 - 789
  • [9] O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique
    Jia, Cangzhi
    Zuo, Yun
    Zou, Quan
    [J]. BIOINFORMATICS, 2018, 34 (12) : 2029 - 2036
  • [10] Analysis and prediction of mammalian protein glycation
    Johansen, Morten Bo
    Kiemer, Lars
    Brunak, Soren
    [J]. GLYCOBIOLOGY, 2006, 16 (09) : 844 - 853