Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models

被引:8
作者
Mehta, Pankaj [1 ]
Schwab, David J. [2 ,3 ]
Sengupta, Anirvan M. [4 ,5 ]
机构
[1] Boston Univ, Dept Phys, Boston, MA 02215 USA
[2] Princeton Univ, Dept Mol Biol, Princeton, NJ 08544 USA
[3] Princeton Univ, Lewis Sigler Inst, Princeton, NJ 08544 USA
[4] Rutgers State Univ, BioMAPS, Piscataway, NJ USA
[5] Rutgers State Univ, Dept Phys, Piscataway, NJ 08854 USA
关键词
Bioinformatics; Hidden Markov Models; One-dimensional statistical mechanics; Fisher information; Machine learning; PROTEIN;
D O I
10.1007/s10955-010-0102-x
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.
引用
收藏
页码:1187 / 1205
页数:19
相关论文
共 21 条
  • [1] [Anonymous], 2006, Pattern recognition and machine learning
  • [2] A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS
    BAUM, LE
    PETRIE, T
    SOULES, G
    WEISS, N
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01): : 164 - &
  • [3] SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS
    BERG, OG
    VONHIPPEL, PH
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1988, 13 (06) : 207 - 211
  • [4] A biophysical approach to transcription factor binding site discovery
    Djordjevic, M
    Sengupta, AM
    Shraiman, BI
    [J]. GENOME RESEARCH, 2003, 13 (11) : 2381 - 2390
  • [5] OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif
    Drawid, Amar
    Gupta, Nupur
    Nagaraj, Vijayalakshmi H.
    Gelinas, Celine
    Sengupta, Anirvan M.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [6] Protein Sectors: Evolutionary Units of Three-Dimensional Structure
    Halabi, Najeeb
    Rivoire, Olivier
    Leibler, Stanislas
    Ranganathan, Rama
    [J]. CELL, 2009, 138 (04) : 774 - 786
  • [7] AN INVARIANT FORM FOR THE PRIOR PROBABILITY IN ESTIMATION PROBLEMS
    JEFFREYS, H
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL AND PHYSICAL SCIENCES, 1946, 186 (1007): : 453 - 461
  • [8] Precise physical models of protein - DNA interaction from high-throughput data
    Kinney, Justin B.
    Tkacik, Gasper
    Callan, Curtis G., Jr.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (02) : 501 - 506
  • [9] Mahalanobis P. C., 1936, P NAT I SCI INDIA, V2, P49
  • [10] Maximum entropy models for antibody diversity
    Mora, Thierry
    Walczak, Aleksandra M.
    Bialek, William
    Callan, Curtis G., Jr.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (12) : 5405 - 5410