Position dependencies in transcription factor binding sites

被引:61
作者
Tomovic, Andrija [1 ]
Oakeley, Edward J. [1 ]
机构
[1] Novartis Res Fdn, Friedrich Miescher Inst Biomed Res, CH-4058 Basel, Switzerland
关键词
D O I
10.1093/bioinformatics/btm055
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Most of the available tools for transcription factor binding site prediction are based on methods which assume no sequence dependence between the binding site base positions. Our primary objective was to investigate the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and to use the resulting data to develop improved scoring functions for binding-site prediction. Results: Using three statistical tests, we analyzed the number of binding sites showing dependent positions. We analyzed transcription factor-DNA crystal structures for evidence of position dependence. Our final conclusions were that some factors show evidence of dependencies whereas others do not. We observed that the conformational energy (Z-score) of the transcription factor-DNA complexes was lower (better) for sequences that showed dependency than for those that did not (P < 0.02). We suggest that where evidence exists for dependencies, these should be modeled to improve binding-site predictions. However, when no significant dependency is found, this correction should be omitted. This may be done by converting any existing scoring function which assumes independence into a form which includes a dependency correction. We present an example of such an algorithm and its implementation as a web tool.
引用
收藏
页码:933 / 941
页数:9
相关论文
共 62 条
  • [1] AGRESETI A, 1990, CATEGORICAL DATA ANA
  • [2] ReadOut:: structure-based calculation of direct and indirect readout energies and specificities for protein-DNA recognition
    Ahmad, Shandar
    Kono, Hidetoshi
    Arauzo-Bravo, Marcos J.
    Sarai, Akinori
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W124 - W127
  • [3] Bailey TL., 1994, P 2 INT C INT SYST M, V2, P28
  • [4] Barash Y., 2003, P 7 ANN INT C COMP M, P28
  • [5] Efficient exact p-value computation for small sample, sparse, and surprising categorical data
    Bejerano, G
    Friedman, N
    Tishby, N
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2004, 11 (05) : 867 - 886
  • [6] Branch and bound computation of exact p-values
    Bejerano, Gill
    [J]. BIOINFORMATICS, 2006, 22 (17) : 2158 - 2159
  • [7] Additivity in protein-DNA interactions: how good an approximation is it?
    Benos, PV
    Bulyk, ML
    Stormo, GD
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (20) : 4442 - 4451
  • [8] Probabilistic code for DNA recognition by proteins of the EGR family
    Benos, PV
    Lapedes, AS
    Stormo, GD
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2002, 323 (04) : 701 - 727
  • [9] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [10] WEIGHT MATRIX DESCRIPTIONS OF 4 EUKARYOTIC RNA POLYMERASE-II PROMOTER ELEMENTS DERIVED FROM 502 UNRELATED PROMOTER SEQUENCES
    BUCHER, P
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (04) : 563 - 578