StackDPPred: a stacking based prediction of DNA-binding protein from sequence

被引:92
作者
Mishra, Avdesh [1 ]
Pokhrel, Pujan [1 ]
Hoque, Md Tamjidul [1 ]
机构
[1] Univ New Orleans, Dept Comp Sci, New Orleans, LA 70148 USA
关键词
ACCESSIBLE SURFACE-AREA; AMINO-ACID-COMPOSITION; ENERGY FUNCTION; SCORING MATRIX; GENERAL-FORM; WEB SERVER; IDENTIFICATION; GENOME; KERNEL; SITES;
D O I
10.1093/bioinformatics/bty653
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Identification of DNA-binding proteins from only sequence information is one of the most challenging problems in the field of genome annotation. DNA-binding proteins play an important role in various biological processes such as DNA replication, repair, transcription and splicing. Existing experimental techniques for identifying DNA-binding proteins are time-consuming and expensive. Thus, prediction of DNA-binding proteins from sequences alone using computational methods can be useful to quickly annotate and guide the experimental process. Most of the methods developed for predicting DNA-binding proteins use the information from the evolutionary profile, called the position-specific scoring matrix (PSSM) profile, alone and the accuracies of such methods have been limited. Here, we propose a method, called StackDPPred, which utilizes features extracted from PSSM and residue specific contact-energy to help train a stacking based machine learning method for the effective prediction of DNA-binding proteins. Results: Based on benchmark sequences of 1063 (518 DNA-binding and 545 non DNA-binding) proteins and using jackknife validation, StackDPPred achieved an ACC of 89.96%, MCC of 0.799 and AUC of 94.50%. This outcome outperforms several state-of-the-art approaches. Furthermore, when tested on recently designed two independent test datasets, StackDPPred outperforms existing approaches consistently. The proposed StackDPPred can be used for effective prediction of DNA-binding proteins from sequence alone.
引用
收藏
页码:433 / 441
页数:9
相关论文
共 75 条
[1]   Moment-based prediction of DNA-binding proteins [J].
Ahmad, S ;
Sarai, A .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (01) :65-71
[2]   AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].
ALTMAN, NS .
AMERICAN STATISTICIAN, 1992, 46 (03) :175-185
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks [J].
Andrabi, Munazah ;
Mizuguchi, Kenji ;
Sarai, Akinori ;
Ahmad, Shandar .
BMC STRUCTURAL BIOLOGY, 2009, 9
[5]   Intrinsically disordered proteins: regulation and disease [J].
Babu, M. Madan ;
van der Lee, Robin ;
de Groot, Natalia Sanchez ;
Gsponer, Joerg .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2011, 21 (03) :432-440
[6]   Kernel-based machine learning protocol for predicting DNA-binding proteins [J].
Bhardwaj, N ;
Langlois, RE ;
Zhao, GJ ;
Lu, H .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :6486-6493
[7]   Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information [J].
Biswas, Ashis Kumer ;
Noman, Nasimul ;
Sikder, Abdur Rahman .
BMC BIOINFORMATICS, 2010, 11
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   Identification of novel DNA repair proteins via primary sequence, secondary structure, and homology [J].
Brown, J. B. ;
Akutsu, Tatsuya .
BMC BIOINFORMATICS, 2009, 10
[10]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10