A DIRICHLET PROCESS MIXTURE OF HIDDEN MARKOV MODELS FOR PROTEIN STRUCTURE PREDICTION

被引:18
|
作者
Lennox, Kristin P. [1 ]
Dahl, David B. [1 ]
Vannucci, Marina [2 ]
Day, Ryan [3 ]
Tsai, Jerry W. [3 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] Rice Univ, Dept Stat, Houston, TX 77251 USA
[3] Univ Pacific, Dept Chem, Stockton, CA 95211 USA
关键词
Bayesian nonparametrics; density estimation; dihedral angles; protein structure prediction; torsion angles; von Mises distribution; VON-MISES DISTRIBUTION; NONPARAMETRIC PROBLEMS; PROBABILISTIC MODEL; DENSITY-ESTIMATION; BIOINFORMATICS; DISTRIBUTIONS; INFERENCE; GENOMICS; DATABASE; ANGLES;
D O I
10.1214/09-AOAS296
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
By providing new insights into the distribution of a protein's torsion angles, recent statistical models for this data have pointed the way to more efficient methods for protein structure prediction. Most current approaches have concentrated on bivariate models at a single sequence position. There is, however, considerable value in simultaneously modeling angle pairs at multiple sequence positions in a protein. One area of application for such models is in structure prediction for the highly variable loop and turn regions. Such modeling is difficult due to the fact that the number of known protein structures available to estimate these torsion angle distributions is typically small. Furthermore, the data is "sparse" in that not all proteins have angle pairs at each sequence position. We propose a new semiparametric model for the joint distributions of angle pairs at multiple sequence positions. Our model accommodates sparse data by leveraging known information about the behavior of protein secondary structure. We demonstrate our technique by predicting the torsion angles in a loop from the globin fold family. Our results show that a template-based approach can now be successfully extended to modeling the notoriously difficult loop and turn regions.
引用
收藏
页码:916 / 942
页数:27
相关论文
共 50 条
  • [1] Fast Bayesian Inference in Dirichlet Process Mixture Models
    Wang, Lianming
    Dunson, David B.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (01) : 196 - 216
  • [2] Markov chain sampling methods for Dirichlet process mixture
    Neal, RM
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2000, 9 (02) : 249 - 265
  • [3] Orthogonal Mixture of Hidden Markov Models
    Safinianaini, Negar
    de Souza, Camila P. E.
    Bostrom, Henrik
    Lagergren, Jens
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT I, 2021, 12457 : 509 - 525
  • [4] Incorporating global information into secondary structure prediction with hidden Markov models of protein folds
    Di Francesco, V
    McQueen, P
    Garnier, J
    Munson, PJ
    ISMB-97 - FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY, PROCEEDINGS, 1997, : 100 - 103
  • [5] Markov Switching Dirichlet Process Mixture Regression
    Taddy, Matthew A.
    Kottas, Athanasios
    BAYESIAN ANALYSIS, 2009, 4 (04): : 793 - 815
  • [6] A Sequential Algorithm for Fast Fitting of Dirichlet Process Mixture Models
    Zhang, Xiaole
    Nott, David J.
    Yau, Christopher
    Jasra, Ajay
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (04) : 1143 - 1162
  • [7] Background Subtraction with Dirichlet Process Mixture Models
    Haines, Tom S. F.
    Xiang, Tao
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (04) : 670 - 683
  • [8] A method of data mining using Hidden Markov Models (HMMs) for protein secondary structure prediction
    Lasfar, Mourad
    Bouden, Halima
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2017), 2018, 127 : 42 - 51
  • [9] On a class of finite mixture models that includes hidden Markov models
    Bartolucci, Francesco
    Pandolfi, Silvia
    Pennoni, Fulvia
    JOURNAL OF MULTIVARIATE ANALYSIS, 2025, 208
  • [10] Disentangled Sticky Hierarchical Dirichlet Process Hidden Markov Model
    Zhou, Ding
    Gao, Yuanjun
    Paninski, Liam
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT I, 2021, 12457 : 612 - 627