Sequence-based prediction of DNA-binding sites on DNA-binding proteins

被引:0
作者
Gou, Z. [1 ]
Hwang, S. [1 ]
Kuznetsov, B., I [1 ]
机构
[1] SUNY Albany, Gen NY Sis Ctr Excellence Canc Genom, One Discovery Dr, Rensselaer, NY USA
来源
PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1 | 2006年
关键词
protein-DNA interaction; position specific scoring matrix; evolutionary conservation; web-server; DNA binding; prediction; pattern recognition; machine learning;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Identification of DNA-binding sites on DNA-binding proteins is important for functional annotation. Experimental determination of the structure of a protein-DNA complex is an expensive process. Reliable computational methods that utilize the sequence of a DNA-binding protein to predict its DNA-binding interface are needed. Results: We present an application of three machine learning methods: support vector machine, kernel logistic regression, and penalized logistic regression to predict DNA-binding sites on a DNA-binding protein using its amino acid sequence as an input. Prediction is performed using either single sequence or a profile of evolutionary conservation. The performance of our predictors is better than that of other existing sequence-based methods. The outputs of all three individual methods are combined to obtain a consensus prediction. This further improves performance and results in accuracy of 82.4%, sensitivity of 84.9% and specificity of 83.1% for the strict consensus prediction. Availability: http://lcg.rit.albany.edu/dp-bind
引用
收藏
页码:268 / +
页数:2
相关论文
共 10 条
[1]   PSSM-based prediction of DNA binding sites in proteins [J].
Ahmad, S ;
Sarai, A .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]   Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]  
[Anonymous], 2000, SUPPORT VECTOR MACHI
[5]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[6]   Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins [J].
Jones, S ;
Shanahan, HP ;
Berman, HM ;
Thornton, JM .
NUCLEIC ACIDS RESEARCH, 2003, 31 (24) :7189-7198
[7]  
LECESSIE S, 1992, APPL STAT-J ROY ST C, V41, P191
[8]   PREDICTING THE SECONDARY STRUCTURE OF GLOBULAR-PROTEINS USING NEURAL NETWORK MODELS [J].
Qian, N ;
SEJNOWSKI, TJ .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 202 (04) :865-884
[9]   Structure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces [J].
Tsuchiya, Y ;
Kinoshita, K ;
Nakamura, H .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (04) :885-894
[10]   Kernel logistic regression and the import vector machine [J].
Zhu, J ;
Hastie, T .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2005, 14 (01) :185-205