Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites

被引:21
作者
Bauer, Amy L. [1 ]
Hlavacek, William S. [1 ]
Unkefer, Pat J. [2 ]
Mu, Fangping [1 ]
机构
[1] Los Alamos Natl Lab, Div Theoret, Theoret Biol & Biophys Grp, Los Alamos, NM 87545 USA
[2] Los Alamos Natl Lab, Natl Stable Isotope Resource, Biosci Div, Los Alamos, NM USA
关键词
UNIQUE TETRANUCLEOTIDE SEQUENCES; MOLECULAR-DYNAMICS SIMULATIONS; ESCHERICHIA-COLI K-12; INDIRECT READOUT; BASE-PAIR; PROTEIN; OLIGONUCLEOTIDES; RECOGNITION; DATABASE; DESIGN;
D O I
10.1371/journal.pcbi.1001007
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.
引用
收藏
页数:13
相关论文
共 46 条
[11]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[12]   Simulation and modeling of nucleic acid structure, dynamics and interactions [J].
Cheatham, TE .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2004, 14 (03) :360-367
[13]  
CHEN QK, 1995, COMPUT APPL BIOSCI, V11, P563
[14]   Genome-wide analysis of Fis binding in Escherichia coli indicates a causative role for A-/AT-tracts [J].
Cho, Byung-Kwan ;
Knight, Eric M. ;
Barrett, Christian L. ;
Palsson, Bernhard O. .
GENOME RESEARCH, 2008, 18 (06) :900-910
[15]   Molecular dynamics simulations of the 136 unique tetranucleotide sequences of DNA oligonucleotides. II: Sequence context effects on the dynamical structures of the 10 unique dinucleotide steps [J].
Dixit, SB ;
Beveridge, DL ;
Case, DA ;
Cheatham, TE ;
Giudice, E ;
Lankas, F ;
Lavery, R ;
Maddocks, JH ;
Osman, R ;
Sklenar, H ;
Thayer, KM ;
Varnai, P .
BIOPHYSICAL JOURNAL, 2005, 89 (06) :3721-3740
[16]   A biophysical approach to transcription factor binding site discovery [J].
Djordjevic, M ;
Sengupta, AM ;
Shraiman, BI .
GENOME RESEARCH, 2003, 13 (11) :2381-2390
[17]   Toward an atomistic model for predicting transcription-factor binding sites [J].
Endres, RG ;
Schulthess, TC ;
Wingreen, NS .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 57 (02) :262-268
[18]   An introduction to ROC analysis [J].
Fawcett, Tom .
PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874
[19]  
Foloppe N, 2000, J COMPUT CHEM, V21, P86, DOI 10.1002/(SICI)1096-987X(20000130)21:2<86::AID-JCC2>3.0.CO
[20]  
2-G