SPRINT-Gly: predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties

被引:52
作者
Taherzadeh, Ghazaleh [1 ]
Dehzangi, Abdollah [2 ]
Golchin, Maryam [1 ]
Zhou, Yaoqi [1 ,3 ]
Campbell, Matthew P. [3 ]
机构
[1] Griffith Univ, Sch Informat & Commun Technol, Gold Coast, Qld 4215, Australia
[2] Morgan State Univ, Dept Comp Sci, Baltimore, MD 21251 USA
[3] Griffith Univ, Inst Glyc, Parklands Dr, Gold Coast, Qld 4215, Australia
基金
英国医学研究理事会;
关键词
AMINO-ACID; GENERATION; SEQUONS; LOGO;
D O I
10.1093/bioinformatics/btz215
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouse glycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively. Results: The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouse glycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in human glycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features.
引用
收藏
页码:4140 / 4146
页数:7
相关论文
共 55 条
[1]  
Abadi M., 2015, P 12 USENIX S OPERAT
[2]   N-glycan structures: recognition and processing in the ER [J].
Aebi, Markus ;
Bernasconi, Riccardo ;
Clerc, Simone ;
Molinari, Maurizio .
TRENDS IN BIOCHEMICAL SCIENCES, 2010, 35 (02) :74-82
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]  
[Anonymous], 2009, Essentials of Glycobiology
[5]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[6]   Evolution and functional cross-talk of protein post-translational modifications [J].
Beltrao, Pedro ;
Bork, Peer ;
Krogan, Nevan J. ;
van Noort, Vera .
MOLECULAR SYSTEMS BIOLOGY, 2013, 9
[7]   Biases and complex patterns in the residues flanking protein N-glycosylation sites [J].
Ben-Dor, S ;
Esterman, N ;
Rubin, E ;
Sharon, N .
GLYCOBIOLOGY, 2004, 14 (02) :95-101
[8]   Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence [J].
Blom, N ;
Sicheritz-Pontén, T ;
Gupta, R ;
Gammeltoft, S ;
Brunak, S .
PROTEOMICS, 2004, 4 (06) :1633-1649
[9]  
Brunak S., 2004, NETNGLYC 1 0 SERVER
[10]   UniCarbKB: building a knowledge platform for glycoproteomics [J].
Campbell, Matthew P. ;
Peterson, Robyn ;
Mariethoz, Julien ;
Gasteiger, Elisabeth ;
Akune, Yukie ;
Aoki-Kinoshita, Kiyoko F. ;
Lisacek, Frederique ;
Packer, Nicolle H. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D215-D221