PG1cS: Prediction of protein O-GlcNAcylation sites with multiple features and analysis

被引:14
作者
Zhao, Xiaowei [1 ]
Ning, Qiao [1 ]
Chai, Haiting [1 ]
Ai, Meiyue [1 ]
Ma, Zhiqiang [1 ]
机构
[1] NE Normal Univ, Sch Comp Sci & Informat Technol, Changchun 130117, Peoples R China
基金
中国国家自然科学基金;
关键词
O-GlcNAcylated mechanisms; Support vector machines; A two-step feature selection; k-means cluster; AMINO-ACID-COMPOSITION; S-NITROSYLATION SITES; REMOTE HOMOLOGY DETECTION; SEQUENCE-BASED PREDICTOR; PHYSICOCHEMICAL PROPERTIES; ENSEMBLE CLASSIFIER; GENERAL-FORM; PSEAAC; IDENTIFICATION; MODES;
D O I
10.1016/j.jtbi.2015.06.026
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
As a widespread type of protein post-translational modification, O-GIcNAcylation plays crucial regulatory roles in almost all cellular processes and is related to some diseases. To deeply understand O-GlcNAcylated mechanisms, identification of substrates and specific O-GlcNAcylated sites is crucial. Experimental identification is expensive and time-consuming, so computational prediction of O-GIcNAcylated sites has considerable value. In this work, we developed a novel O-GIcNAcylated sites predictor called PGIcS (Prediction of O-GlcNAcylated Sites) by using k-means cluster to obtain informative and reliable negative samples, and support vector machines classifier combined with a two-step feature selection. The performance of PGIcS was evaluated using an independent testing dataset resulting in a sensitivity of 64.62%, a specificity of 68.4%, an accuracy of 68.37%, and a Matthew's correlation coefficient of 0.0697, which demonstrated PGlcS was very promising for predicting O-GlcNAcylated sites. The datasets and source code were available in Supplementary information. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:524 / 529
页数:6
相关论文
共 71 条
[1]  
[Anonymous], 2015, MOL GENET GENOMICS
[2]  
[Anonymous], BIOCHEMISTRY
[3]  
[Anonymous], 2014, SCI REP-UK, DOI DOI 10.1038/SREP07186
[4]  
[Anonymous], 2011, Acm T. Intel. Syst. Tec., DOI [DOI 10.1145/1961189.1961199, 10. 1145/1961189.1961199]
[5]   New consensus features for tyrosine O-sulfation determined by mutational analysis [J].
Bundgaard, JR ;
Vuust, J ;
Rehfeld, JF .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1997, 272 (35) :21700-21705
[6]   propy: a tool to generate various modes of Chou's PseAAC [J].
Cao, Dong-Sheng ;
Xu, Qing-Song ;
Liang, Yi-Zeng .
BIOINFORMATICS, 2013, 29 (07) :960-962
[7]   Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences [J].
Chen, Peng ;
Li, Jinyan ;
Wong, Limsoon ;
Kuwahara, Hiroyuki ;
Huang, Jianhua Z. ;
Gao, Xin .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2013, 81 (08) :1351-1362
[8]   iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Lin, Hao ;
Chou, Kuo-Chen .
NUCLEIC ACIDS RESEARCH, 2013, 41 (06) :e68
[9]   Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites [J].
Chen, Xiang ;
Qiu, Jian-Ding ;
Shi, Shao-Ping ;
Suo, Sheng-Bao ;
Huang, Shu-Yun ;
Liang, Ru-Ping .
BIOINFORMATICS, 2013, 29 (13) :1614-1622
[10]   Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs [J].
Chen, Zhen ;
Chen, Yong-Zi ;
Wang, Xiao-Feng ;
Wang, Chuan ;
Yan, Ren-Xiang ;
Zhang, Ziding .
PLOS ONE, 2011, 6 (07)