Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites

被引:21
作者
Bauer, Amy L. [1 ]
Hlavacek, William S. [1 ]
Unkefer, Pat J. [2 ]
Mu, Fangping [1 ]
机构
[1] Los Alamos Natl Lab, Div Theoret, Theoret Biol & Biophys Grp, Los Alamos, NM 87545 USA
[2] Los Alamos Natl Lab, Natl Stable Isotope Resource, Biosci Div, Los Alamos, NM USA
关键词
UNIQUE TETRANUCLEOTIDE SEQUENCES; MOLECULAR-DYNAMICS SIMULATIONS; ESCHERICHIA-COLI K-12; INDIRECT READOUT; BASE-PAIR; PROTEIN; OLIGONUCLEOTIDES; RECOGNITION; DATABASE; DESIGN;
D O I
10.1371/journal.pcbi.1001007
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.
引用
收藏
页数:13
相关论文
共 46 条
[1]   ReadOut:: structure-based calculation of direct and indirect readout energies and specificities for protein-DNA recognition [J].
Ahmad, Shandar ;
Kono, Hidetoshi ;
Arauzo-Bravo, Marcos J. ;
Sarai, Akinori .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W124-W127
[2]  
Aparicio Oscar, 2004, Curr Protoc Cell Biol, VChapter 17, DOI 10.1002/0471143030.cb1707s23
[3]   Indirect readout: detection of optimized subsequences and calculation of relative binding affinities using different DNA elastic potentials [J].
Becker, Nils B. ;
Wolff, Lars ;
Everaers, Ralf .
NUCLEIC ACIDS RESEARCH, 2006, 34 (19) :5638-5649
[4]   Additivity in protein-DNA interactions: how good an approximation is it? [J].
Benos, PV ;
Bulyk, ML ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 2002, 30 (20) :4442-4451
[5]   Probabilistic code for DNA recognition by proteins of the EGR family [J].
Benos, PV ;
Lapedes, AS ;
Stormo, GD .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 323 (04) :701-727
[6]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[7]   Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors [J].
Berger, Michael F. ;
Bulyk, Martha L. .
NATURE PROTOCOLS, 2009, 4 (03) :393-411
[8]   THE NUCLEIC-ACID DATABASE - A COMPREHENSIVE RELATIONAL DATABASE OF 3-DIMENSIONAL STRUCTURES OF NUCLEIC-ACIDS [J].
BERMAN, HM ;
OLSON, WK ;
BEVERIDGE, DL ;
WESTBROOK, J ;
GELBIN, A ;
DEMENY, T ;
HSIEH, SH ;
SRINIVASAN, AR ;
SCHNEIDER, B .
BIOPHYSICAL JOURNAL, 1992, 63 (03) :751-759
[9]   Molecular dynamics simulations of the 136 unique tetranucleotide sequences of DNA oligonucleotides.: I.: Research design and results on d(CpG) steps [J].
Beveridge, DL ;
Barreiro, G ;
Byun, KS ;
Case, DA ;
Cheatham, TE ;
Dixit, SB ;
Giudice, E ;
Lankas, F ;
Lavery, R ;
Maddocks, JH ;
Osman, R ;
Seibert, E ;
Sklenar, H ;
Stoll, G ;
Thayer, KM ;
Varnai, P ;
Young, MA .
BIOPHYSICAL JOURNAL, 2004, 87 (06) :3799-3813
[10]   Matlnspector and beyond: promoter analysis based on transcription factor binding sites [J].
Cartharius, K ;
Frech, K ;
Grote, K ;
Klocke, B ;
Haltmeier, M ;
Klingenhoff, A ;
Frisch, M ;
Bayerlein, M ;
Werner, T .
BIOINFORMATICS, 2005, 21 (13) :2933-2942