StackDPPred: a stacking based prediction of DNA-binding protein from sequence

被引：92

作者：

Mishra, Avdesh ^{[1
]}

Pokhrel, Pujan ^{[1
]}

Hoque, Md Tamjidul ^{[1
]}

机构：

[1] Univ New Orleans, Dept Comp Sci, New Orleans, LA 70148 USA

来源：

BIOINFORMATICS | 2019年 / 35卷 / 03期

关键词：

ACCESSIBLE SURFACE-AREA; AMINO-ACID-COMPOSITION; ENERGY FUNCTION; SCORING MATRIX; GENERAL-FORM; WEB SERVER; IDENTIFICATION; GENOME; KERNEL; SITES;

D O I：

10.1093/bioinformatics/bty653

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Identification of DNA-binding proteins from only sequence information is one of the most challenging problems in the field of genome annotation. DNA-binding proteins play an important role in various biological processes such as DNA replication, repair, transcription and splicing. Existing experimental techniques for identifying DNA-binding proteins are time-consuming and expensive. Thus, prediction of DNA-binding proteins from sequences alone using computational methods can be useful to quickly annotate and guide the experimental process. Most of the methods developed for predicting DNA-binding proteins use the information from the evolutionary profile, called the position-specific scoring matrix (PSSM) profile, alone and the accuracies of such methods have been limited. Here, we propose a method, called StackDPPred, which utilizes features extracted from PSSM and residue specific contact-energy to help train a stacking based machine learning method for the effective prediction of DNA-binding proteins. Results: Based on benchmark sequences of 1063 (518 DNA-binding and 545 non DNA-binding) proteins and using jackknife validation, StackDPPred achieved an ACC of 89.96%, MCC of 0.799 and AUC of 94.50%. This outcome outperforms several state-of-the-art approaches. Furthermore, when tested on recently designed two independent test datasets, StackDPPred outperforms existing approaches consistently. The proposed StackDPPred can be used for effective prediction of DNA-binding proteins from sequence alone.

引用

页码：433 / 441

页数：9

共 75 条

[1] Moment-based prediction of DNA-binding proteins [J].

Ahmad, S ;

Sarai, A .

JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (01) :65-71

[2] AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].

ALTMAN, NS .

AMERICAN STATISTICIAN, 1992, 46 (03) :175-185

[3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].

Altschul, SF ;

Madden, TL ;

Schaffer, AA ;

Zhang, JH ;

Zhang, Z ;

Miller, W ;

Lipman, DJ .

NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402

[4] Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks [J].

Andrabi, Munazah ;

Mizuguchi, Kenji ;

Sarai, Akinori ;

Ahmad, Shandar .

BMC STRUCTURAL BIOLOGY, 2009, 9

[5] Intrinsically disordered proteins: regulation and disease [J].

Babu, M. Madan ;

van der Lee, Robin ;

de Groot, Natalia Sanchez ;

Gsponer, Joerg .

CURRENT OPINION IN STRUCTURAL BIOLOGY, 2011, 21 (03) :432-440

[6] Kernel-based machine learning protocol for predicting DNA-binding proteins [J].

Bhardwaj, N ;

Langlois, RE ;

Zhao, GJ ;

Lu, H .

NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :6486-6493

[7] Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information [J].

Biswas, Ashis Kumer ;

Noman, Nasimul ;

Sikder, Abdur Rahman .

BMC BIOINFORMATICS, 2010, 11

[8] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[9] Identification of novel DNA repair proteins via primary sequence, secondary structure, and homology [J].

Brown, J. B. ;

Akutsu, Tatsuya .

BMC BIOINFORMATICS, 2009, 10

[10] BLAST plus : architecture and applications [J].

Camacho, Christiam ;

Coulouris, George ;

Avagyan, Vahram ;

Ma, Ning ;

Papadopoulos, Jason ;

Bealer, Kevin ;

Madden, Thomas L. .

BMC BIOINFORMATICS, 2009, 10

← 1 2 3 4 5 6 7 8 →