Sequence-based prediction of DNA-binding sites on DNA-binding proteins

被引：0

作者：

Gou, Z. ^{[1
]}

Hwang, S. ^{[1
]}

Kuznetsov, B., I ^{[1
]}

机构：

[1] SUNY Albany, Gen NY Sis Ctr Excellence Canc Genom, One Discovery Dr, Rensselaer, NY USA

来源：

PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1 | 2006年

关键词：

protein-DNA interaction; position specific scoring matrix; evolutionary conservation; web-server; DNA binding; prediction; pattern recognition; machine learning;

D O I：

暂无

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Identification of DNA-binding sites on DNA-binding proteins is important for functional annotation. Experimental determination of the structure of a protein-DNA complex is an expensive process. Reliable computational methods that utilize the sequence of a DNA-binding protein to predict its DNA-binding interface are needed. Results: We present an application of three machine learning methods: support vector machine, kernel logistic regression, and penalized logistic regression to predict DNA-binding sites on a DNA-binding protein using its amino acid sequence as an input. Prediction is performed using either single sequence or a profile of evolutionary conservation. The performance of our predictors is better than that of other existing sequence-based methods. The outputs of all three individual methods are combined to obtain a consensus prediction. This further improves performance and results in accuracy of 82.4%, sensitivity of 84.9% and specificity of 83.1% for the strict consensus prediction. Availability: http://lcg.rit.albany.edu/dp-bind

引用

页码：268 / +

页数：2

共 10 条

[1] PSSM-based prediction of DNA binding sites in proteins [J].

Ahmad, S ;

Sarai, A .

BMC BIOINFORMATICS, 2005, 6 (1)

[2] Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].

Ahmad, S ;

Gromiha, MM ;

Sarai, A .

BIOINFORMATICS, 2004, 20 (04) :477-486

[3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].

Altschul, SF ;

Madden, TL ;

Schaffer, AA ;

Zhang, JH ;

Zhang, Z ;

Miller, W ;

Lipman, DJ .

NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402

[4]

[Anonymous], 2000, SUPPORT VECTOR MACHI

[5] AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].

HENIKOFF, S ;

HENIKOFF, JG .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919

[6] Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins [J].

Jones, S ;

Shanahan, HP ;

Berman, HM ;

Thornton, JM .

NUCLEIC ACIDS RESEARCH, 2003, 31 (24) :7189-7198

[7]

LECESSIE S, 1992, APPL STAT-J ROY ST C, V41, P191

[8] PREDICTING THE SECONDARY STRUCTURE OF GLOBULAR-PROTEINS USING NEURAL NETWORK MODELS [J].

Qian, N ;

SEJNOWSKI, TJ .

JOURNAL OF MOLECULAR BIOLOGY, 1988, 202 (04) :865-884

[9] Structure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces [J].

Tsuchiya, Y ;

Kinoshita, K ;

Nakamura, H .

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (04) :885-894

[10] Kernel logistic regression and the import vector machine [J].

Zhu, J ;

Hastie, T .

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2005, 14 (01) :185-205

← 1 →