Position dependencies in transcription factor binding sites

被引：61

作者：

Tomovic, Andrija ^{[1
]}

Oakeley, Edward J. ^{[1
]}

机构：

[1] Novartis Res Fdn, Friedrich Miescher Inst Biomed Res, CH-4058 Basel, Switzerland

来源：

BIOINFORMATICS | 2007年 / 23卷 / 08期

关键词：

D O I：

10.1093/bioinformatics/btm055

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Most of the available tools for transcription factor binding site prediction are based on methods which assume no sequence dependence between the binding site base positions. Our primary objective was to investigate the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and to use the resulting data to develop improved scoring functions for binding-site prediction. Results: Using three statistical tests, we analyzed the number of binding sites showing dependent positions. We analyzed transcription factor-DNA crystal structures for evidence of position dependence. Our final conclusions were that some factors show evidence of dependencies whereas others do not. We observed that the conformational energy (Z-score) of the transcription factor-DNA complexes was lower (better) for sequences that showed dependency than for those that did not (P < 0.02). We suggest that where evidence exists for dependencies, these should be modeled to improve binding-site predictions. However, when no significant dependency is found, this correction should be omitted. This may be done by converting any existing scoring function which assumes independence into a form which includes a dependency correction. We present an example of such an algorithm and its implementation as a web tool.

引用

页码：933 / 941

页数：9

共 62 条

[1] AGRESETI A, 1990, CATEGORICAL DATA ANA
[2] ReadOut:: structure-based calculation of direct and indirect readout energies and specificities for protein-DNA recognition
Ahmad, Shandar
Kono, Hidetoshi
Arauzo-Bravo, Marcos J.
Sarai, Akinori
[J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W124 - W127
[3] Bailey TL., 1994, P 2 INT C INT SYST M, V2, P28
[4] Barash Y., 2003, P 7 ANN INT C COMP M, P28
[5] Efficient exact p-value computation for small sample, sparse, and surprising categorical data
Bejerano, G
Friedman, N
Tishby, N
[J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2004, 11 (05) : 867 - 886
[6] Branch and bound computation of exact p-values
Bejerano, Gill
[J]. BIOINFORMATICS, 2006, 22 (17) : 2158 - 2159
[7] Additivity in protein-DNA interactions: how good an approximation is it?
Benos, PV
Bulyk, ML
Stormo, GD
[J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (20) : 4442 - 4451
[8] Probabilistic code for DNA recognition by proteins of the EGR family
Benos, PV
Lapedes, AS
Stormo, GD
[J]. JOURNAL OF MOLECULAR BIOLOGY, 2002, 323 (04) : 701 - 727
[9] The Protein Data Bank
Berman, HM
Westbrook, J
Feng, Z
Gilliland, G
Bhat, TN
Weissig, H
Shindyalov, IN
Bourne, PE
[J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
[10] WEIGHT MATRIX DESCRIPTIONS OF 4 EUKARYOTIC RNA POLYMERASE-II PROMOTER ELEMENTS DERIVED FROM 502 UNRELATED PROMOTER SEQUENCES
BUCHER, P
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (04) : 563 - 578

← 1 2 3 4 5 6 7 →