Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information

被引:236
作者
Wei, Leyi [1 ]
Tang, Jijun [1 ,2 ]
Zou, Quan [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29208 USA
基金
中国国家自然科学基金;
关键词
DNA-binding protein prediction; Random forest; Local evolutionary information; Machine learning-based method; Feature representation algorithm; AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINES; RIBOSOMAL-RNA-BINDING; WEB SERVER; IDENTIFICATION; GENERATION; PROFILES; PROTOCOL; SITES; SVM;
D O I
10.1016/j.ins.2016.06.026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Increased knowledge of DNA-binding proteins would enhance our understanding of protein functions in cellular biological processes. To handle the explosive growth of protein sequence data, researchers have developed machine learning-based methods that quickly and accurately predict DNA-binding proteins. In recent years, the predictive accuracy of machine learning-based predictors has significantly advanced, but the predictive performance remains unsatisfactory. In this paper, we establish a novel predictor named Local-DPP, which combines the local Pse-PSSM (Pseudo Position-Specific Scoring Matrix) features with the random forest classifier. The proposed features can efficiently capture the local conservation information, together with the sequence-order information, from the evolutionary profiles (PSSMs). We evaluate and compare the Local-DPP predictor with state-of-the-art predictors on two stringent benchmark datasets (one for the jackknife test, the other for an independent test). The proposed Local-DPP significantly improved the accuracy of the existing predictors, from 77.3% to 79.2% and 76.9% to 79.0% in the jackknife and independent tests, respectively. This demonstrates the efficacy and effectiveness of Local-DPP in predicting DNA-binding proteins. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:135 / 144
页数:10
相关论文
共 53 条
[1]   Moment-based prediction of DNA-binding proteins [J].
Ahmad, S ;
Sarai, A .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (01) :65-71
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL [J].
Bairoch, A ;
Apweller, R .
NUCLEIC ACIDS RESEARCH, 1997, 25 (01) :31-36
[4]   Kernel-based machine learning protocol for predicting DNA-binding proteins [J].
Bhardwaj, N ;
Langlois, RE ;
Zhao, GJ ;
Lu, H .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :6486-6493
[5]   Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions [J].
Bhardwaj, Nitin ;
Lu, Hui .
FEBS LETTERS, 2007, 581 (05) :1058-1066
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence [J].
Cai, YD ;
Lin, SL .
BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2003, 1648 (1-2) :127-133
[8]   MemType-2L: A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2007, 360 (02) :339-345
[9]  
Deng HT, 2011, LECT NOTES COMPUT SC, V6792, P293, DOI 10.1007/978-3-642-21738-8_38
[10]  
Dondoshansky I., 2002, Blastclust (ncbi software development toolkit)