Sequence-based Protein-Ca2+Binding Site Prediction Using SVM Classifier Finsemble with Random Under-Sampling

被引:0
作者
Qiao, Liang [1 ]
Xie, Dongqing [1 ]
机构
[1] Guangzhou Univ, Sch Math & Informat Sci, Guangzhou 510006, Guangdong, Peoples R China
来源
PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC 2017) | 2017年
基金
中国国家自然科学基金;
关键词
Protein-Ca2+binding site prediction; Imbalanced data learning; Random under sampling; Support vector machine; BINDING-SITES; PROTEIN; RESIDUES; DATABASE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Calcium ions (Ca2) are crucial for protein function. They participate in enzyme catalysis, play regulatory roles, and help maintain protein structure. Accurately recognizing Ca2 -binding sites is of significant importance for protein function analysis. Although much progress has been made, challenges remain, especially in the post-genome era where large volume of proteins without being functional annotated are quickly accumulated. In this study, we design a new ab initio predictor, CaSite, to identify Ca2+-binding residues from protein sequence. CaSite first uses evolutionary information, predicted secondary structure, predicted solvent accessibility, and Jensen -Shannon divergence information to represent each residue sample feature. A mean ensemble classifier constructed based on support vector machines (SVM) from multiple random under -samplings is used as the final prediction model, which is effective for relieving the negative influence of the imbalance phenomenon between positive and negative training samples. Experimental results demonstrate that the proposed CaSite achieves a better prediction performance and outperforms the existing sequence -based predictor, TargetS.
引用
收藏
页码:86 / 90
页数:5
相关论文
共 27 条
[1]   Protein-DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins [J].
Ahmad, Shandar ;
Keskin, Ozlem ;
Sarai, Akinori ;
Nussinov, Ruth .
NUCLEIC ACIDS RESEARCH, 2008, 36 (18) :5922-5932
[2]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[3]   A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation [J].
Brylinski, Michal ;
Skolnick, Jeffrey .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (01) :129-134
[4]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[5]   Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors [J].
Chen, Ke ;
Mizianty, Marcin J. ;
Kurgan, Lukasz .
BIOINFORMATICS, 2012, 28 (03) :331-341
[6]   ATPsite: sequence-based prediction of ATP-binding residues [J].
Chen, Ke ;
Mizianty, Marcin J. ;
Kurgan, Lukasz .
PROTEOME SCIENCE, 2011, 9
[7]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[8]   POSITION-BASED SEQUENCE WEIGHTS [J].
HENIKOFF, S ;
HENIKOFF, JG .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 243 (04) :574-578
[9]   Development of a Surface Plasmon Resonance Biosensing Approach for the Rapid Detection of Porcine Circovirus Type2 in Sample Solutions [J].
Hu, Jiandong ;
Wang, Tingting ;
Wang, Shun ;
Chen, Mingwen ;
Wang, Manping ;
Mu, Linying ;
Chen, Hongyin ;
Hu, Xinran ;
Liang, Hao ;
Zhu, Juanhua ;
Jiang, Min .
PLOS ONE, 2014, 9 (10)
[10]   Recognizing metal and acid radical ion-binding sites by integrating ab initio modeling with template-based transferals [J].
Hu, Xiuzhen ;
Dong, Qiwen ;
Yang, Jianyi ;
Zhang, Yang .
BIOINFORMATICS, 2016, 32 (21) :3260-3269