Glypre: In Silico Prediction of Protein Glycation Sites by Fusing Multiple Features and Support Vector Machine

被引:16
作者
Zhao, Xiaowei [1 ,2 ]
Zhao, Xiaosa [1 ]
Bao, Lingling [1 ]
Zhang, Yonggang [2 ]
Dai, Jiangyan [3 ]
Yin, Minghao [1 ]
机构
[1] Northeast Normal Univ, Sch Comp Sci & Informat Technol, Changchun 130117, Jilin, Peoples R China
[2] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China
[3] Weifang Univ, Sch Comp Engn, Weifang 261061, Peoples R China
基金
中国国家自然科学基金;
关键词
glycation sites; support vector machine; feature analysis; SEQUENCE-DERIVED FEATURES; END-PRODUCTS; RESOURCE;
D O I
10.3390/molecules22111891
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Glycation is a non-enzymatic process occurring inside or outside the host body by attaching a sugar molecule to a protein or lipid molecule. It is an important form of post-translational modification (PTM), which impairs the function and changes the characteristics of the proteins so that the identification of the glycation sites may provide some useful guidelines to understand various biological functions of proteins. In this study, we proposed an accurate prediction tool, named Glypre, for lysine glycation. Firstly, we used multiple informative features to encode the peptides. These features included the position scoring function, secondary structure, AAindex, and the composition of k-spaced amino acid pairs. Secondly, the distribution of distinctive features of the residues surrounding the glycation and non-glycation sites was statistically analysed. Thirdly, based on the distribution of these features, we developed a new predictor by using different optimal window sizes for different properties and a two-step feature selection method, which utilized the maximum relevance minimum redundancy method followed by a greedy feature selection procedure. The performance of Glypre was measured with a sensitivity of 57.47%, a specificity of 90.78%, an accuracy of 79.68%, area under the receiver-operating characteristic (ROC) curve (AUC) of 0.86, and a Matthews's correlation coefficient (MCC) of 0.52 by 10-fold cross-validation. The detailed analysis results showed that our predictor may play a complementary role to other existing methods for identifying protein lysine glycation. The source code and datasets of the Glypre are available in the Supplementary File.
引用
收藏
页数:15
相关论文
共 37 条
[1]   Activities at the Universal Protein Resource (UniProt) [J].
Apweiler, Rolf ;
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Casanova, Elisabet Barrera ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chan, Wei Mun ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Gane, Paul ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightingale, Andrew ;
Orchard, Sandra ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier ;
Zellner, Hermann ;
Corbett, Matt .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D191-D198
[2]  
Baynes J W, 1989, Prog Clin Biol Res, V304, P43
[3]   propy: a tool to generate various modes of Chou's PseAAC [J].
Cao, Dong-Sheng ;
Xu, Qing-Song ;
Liang, Yi-Zeng .
BIOINFORMATICS, 2013, 29 (07) :960-962
[4]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[5]   Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines [J].
Chen, Wei ;
Xing, Pengwei ;
Zou, Quan .
SCIENTIFIC REPORTS, 2017, 7
[6]   Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs [J].
Chen, Yong-Zi ;
Tang, Yu-Rong ;
Sheng, Zhi-Ya ;
Zhang, Ziding .
BMC BIOINFORMATICS, 2008, 9 (1) :101
[7]   The road to advanced glycation end products: A mechanistic perspective [J].
Cho, S.-J. ;
Roman, G. ;
Yeboah, F. ;
Konishi, Y. .
CURRENT MEDICINAL CHEMISTRY, 2007, 14 (15) :1653-1671
[8]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[9]   Glycation and oxidation of histones H2B and H1: in vitro study and characterization by mass spectrometry [J].
Guedes, Sofia ;
Vitorino, Rui ;
Domingues, Maria R. M. ;
Amado, Francisco ;
Domingues, Pedro .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2011, 399 (10) :3529-3539
[10]   PredSulSite: Prediction of protein tyrosine sulfation sites with multiple features and analysis [J].
Huang, Shu-Yun ;
Shi, Shao-Ping ;
Qiu, Jian-Ding ;
Sun, Xing-Yu ;
Suo, Sheng-Bao ;
Liang, Ru-Ping .
ANALYTICAL BIOCHEMISTRY, 2012, 428 (01) :16-23