A DIRICHLET PROCESS MIXTURE OF HIDDEN MARKOV MODELS FOR PROTEIN STRUCTURE PREDICTION

被引:18
|
作者
Lennox, Kristin P. [1 ]
Dahl, David B. [1 ]
Vannucci, Marina [2 ]
Day, Ryan [3 ]
Tsai, Jerry W. [3 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] Rice Univ, Dept Stat, Houston, TX 77251 USA
[3] Univ Pacific, Dept Chem, Stockton, CA 95211 USA
关键词
Bayesian nonparametrics; density estimation; dihedral angles; protein structure prediction; torsion angles; von Mises distribution; VON-MISES DISTRIBUTION; NONPARAMETRIC PROBLEMS; PROBABILISTIC MODEL; DENSITY-ESTIMATION; BIOINFORMATICS; DISTRIBUTIONS; INFERENCE; GENOMICS; DATABASE; ANGLES;
D O I
10.1214/09-AOAS296
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
By providing new insights into the distribution of a protein's torsion angles, recent statistical models for this data have pointed the way to more efficient methods for protein structure prediction. Most current approaches have concentrated on bivariate models at a single sequence position. There is, however, considerable value in simultaneously modeling angle pairs at multiple sequence positions in a protein. One area of application for such models is in structure prediction for the highly variable loop and turn regions. Such modeling is difficult due to the fact that the number of known protein structures available to estimate these torsion angle distributions is typically small. Furthermore, the data is "sparse" in that not all proteins have angle pairs at each sequence position. We propose a new semiparametric model for the joint distributions of angle pairs at multiple sequence positions. Our model accommodates sparse data by leveraging known information about the behavior of protein secondary structure. We demonstrate our technique by predicting the torsion angles in a loop from the globin fold family. Our results show that a template-based approach can now be successfully extended to modeling the notoriously difficult loop and turn regions.
引用
收藏
页码:916 / 942
页数:27
相关论文
共 50 条
  • [21] Clustering and unconstrained ordination with Dirichlet process mixture models
    Stratton, Christian
    Hoegh, Andrew
    Rodhouse, Thomas J.
    Green, Jennifer L.
    Banner, Katharine M.
    Irvine, Kathryn M.
    METHODS IN ECOLOGY AND EVOLUTION, 2024, 15 (09): : 1720 - 1732
  • [22] Dirichlet process mixture models for insurance loss data
    Hong, Liang
    Martin, Ryan
    SCANDINAVIAN ACTUARIAL JOURNAL, 2018, (06) : 545 - 554
  • [23] Semiparametric Hidden Markov Models
    Dannemann, Joern
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2012, 21 (03) : 677 - 692
  • [24] Mean field inference for the Dirichlet process mixture model
    Zobay, O.
    ELECTRONIC JOURNAL OF STATISTICS, 2009, 3 : 507 - 545
  • [25] Nonparametric empirical Bayes for the Dirichlet process mixture model
    McAuliffe, JD
    Blei, DM
    Jordan, MI
    STATISTICS AND COMPUTING, 2006, 16 (01) : 5 - 14
  • [26] Nonparametric empirical Bayes for the Dirichlet process mixture model
    Jon D. McAuliffe
    David M. Blei
    Michael I. Jordan
    Statistics and Computing, 2006, 16 : 5 - 14
  • [27] A Composite Approach to Protein Tertiary Structure Prediction: Hidden Markov Model Based on Lattice
    Peyravi, Farzad
    Latif, Alimohammad
    Moshtaghioun, Seyed Mohammad
    BULLETIN OF MATHEMATICAL BIOLOGY, 2019, 81 (03) : 899 - 918
  • [28] A Composite Approach to Protein Tertiary Structure Prediction: Hidden Markov Model Based on Lattice
    Farzad Peyravi
    Alimohammad Latif
    Seyed Mohammad Moshtaghioun
    Bulletin of Mathematical Biology, 2019, 81 : 899 - 918
  • [29] Efficient Bayesian estimation and use of cut posterior in semiparametric hidden Markov models
    Moss, Daniel
    Rousseau, Judith
    ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (01): : 1815 - 1886
  • [30] An optimal data ordering scheme for Dirichlet process mixture models
    Wang, Xue
    Walker, Stephen G.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 112 : 42 - 52