Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines

被引:55
作者
Taherzadeh, Ghazaleh [1 ]
Zhou, Yaoqi [1 ,2 ]
Liew, Alan Wee-Chung [1 ]
Yang, Yuedong [1 ,2 ]
机构
[1] Griffith Univ, Sch Informat & Commun Technol, Parklands Dr, Southport, Qld 4215, Australia
[2] Griffith Univ, Inst Glyc, Parklands Dr, Southport, Qld 4215, Australia
基金
澳大利亚研究理事会; 英国医学研究理事会;
关键词
STRUCTURAL FEATURES; SECONDARY STRUCTURE; DNA INTERACTIONS; RESIDUES; IDENTIFICATION; GENERATION; LIGAND; NUCLEOTIDE; ACCURATE; REGIONS;
D O I
10.1021/acs.jcim.6b00320
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH.
引用
收藏
页码:2115 / 2122
页数:8
相关论文
共 58 条
[1]   Identification of Mannose Interacting Residues Using Local Composition [J].
Agarwal, Sandhya ;
Mishra, Nitish Kumar ;
Singh, Harinder ;
Raghava, Gajendra P. S. .
PLOS ONE, 2011, 6 (09)
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[4]  
[Anonymous], 2014, Nucleic Acids Research, V43, P204
[5]   BP-Dock: A Flexible Docking Scheme for Exploring Protein Ligand Interactions Based on Unbound Structures [J].
Bolia, Ashini ;
Gerek, Z. Nevin ;
Ozkan, S. Banu .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2014, 54 (03) :913-925
[6]   Carbohydrate binding molecules in malaria pathology [J].
Brown, Alan ;
Higgins, Matthew K. .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2010, 20 (05) :560-566
[7]   Glycosylation site prediction using ensembles of Support Vector Machine classifiers [J].
Caragea, Cornelia ;
Sinapov, Jivko ;
Silvescu, Adrian ;
Dobbs, Drena ;
Honavar, Vasant .
BMC BIOINFORMATICS, 2007, 8 (1)
[8]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors [J].
Chen, Ke ;
Mizianty, Marcin J. ;
Kurgan, Lukasz .
BIOINFORMATICS, 2012, 28 (03) :331-341