Comparative analysis of methods for representing and searching for transcription factor binding sites

被引:54
作者
Osada, R
Zaslavsky, E
Singh, M [1 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[2] Princeton Univ, Lewis Sigler Inst Integrat Gen, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/bth438
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: An important step in unravelling the transcriptional regulatory network of an organism is to identify, for each transcription factor, all of its DNA binding sites. Several approaches are commonly used in searching for a transcription factor's binding sites, including consensus sequences and position-specific scoring matrices. In addition, methods that compute the average number of nucleotide matches between a putative site and all known sites can be employed. Such basic approaches can all be naturally extended by incorporating pairwise nucleotide dependencies and per-position information content. In this paper, we evaluate the effectiveness of these basic approaches and their extensions in finding binding sites for a transcription factor of interest without erroneously identifying other genomic sequences. Results: In cross-validation testing on a dataset of Escherichia coli transcription factors and their binding sites, we show that there are statistically significant differences in how well various methods identify transcription factor binding sites. The use of per-position information content improves the performance of all basic approaches. Furthermore, including local pairwise nucleotide dependencies within binding site models results in statistically significant performance improvements for approaches based on nucleotide matches. Based on our analysis, the best results when searching for DNA binding sites of a particular transcription factor are obtained by methods that incorporate both information content and local pairwise correlations.
引用
收藏
页码:3516 / 3525
页数:10
相关论文
共 31 条
[1]  
Barash Y., 2003, P 7 ANN INT C COMP M, P28
[2]   Additivity in protein-DNA interactions: how good an approximation is it? [J].
Benos, PV ;
Bulyk, ML ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 2002, 30 (20) :4442-4451
[3]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS .2. THE BINDING-SPECIFICITY OF CYCLIC-AMP RECEPTOR PROTEIN TO RECOGNITION SITES [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 200 (04) :709-723
[4]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[5]   Discovery of regulatory elements by a computational method for phylogenetic footprinting [J].
Blanchette, M ;
Tompa, M .
GENOME RESEARCH, 2002, 12 (05) :739-748
[6]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[7]   Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors [J].
Bulyk, ML ;
Johnson, PLF ;
Church, GM .
NUCLEIC ACIDS RESEARCH, 2002, 30 (05) :1255-1261
[8]   CRITICAL COMPARISON OF CONSENSUS METHODS FOR MOLECULAR SEQUENCES [J].
DAY, WHE ;
MCMORRIS, FR .
NUCLEIC ACIDS RESEARCH, 1992, 20 (05) :1093-1099
[9]  
Gelfand M S, 1995, J Comput Biol, V2, P87, DOI 10.1089/cmb.1995.2.87
[10]   Prediction of transcription regulatory sites in Archaea by a comparative genomic approach [J].
Gelfand, MS ;
Koonin, EV ;
Mironov, AA .
NUCLEIC ACIDS RESEARCH, 2000, 28 (03) :695-705