An expectation maximization algorithm for training hidden substitution models

被引:62
作者
Holmes, I [1 ]
Rubin, GM [1 ]
机构
[1] Univ Calif Berkeley, Howard Hughes Med Inst, Berkeley, CA 94720 USA
关键词
molecular evolution; bioinformatics; amino acid substitution rates; Markov models;
D O I
10.1006/jmbi.2002.5405
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We derive an expectation maximization algorithm for maximum-likelihood training of substitution rate matrices from multiple sequence alignments. The algorithm can be used to train hidden substitution models, where the structural context of a residue is treated as a hidden variable that can evolve over time. We used the algorithm to train hidden substitution matrices on protein alignments in the Pfam database. Measuring the accuracy of multiple alignment algorithms with reference to BAli-BASE (a database of structural reference alignments) our substitution matrices consistently outperform the PAM series, with the improvement steadily increasing as up to four hidden site classes are added. We discuss several applications of this algorithm in bioinformatics. (C) 2002 Elsevier Science Ltd.
引用
收藏
页码:753 / 764
页数:12
相关论文
共 50 条
  • [21] A characterization of HRV's nonlinear hidden dynamics by means of Markov models
    Silipo, R
    Deco, G
    Vergassola, R
    Gremigni, C
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 1999, 46 (08) : 978 - 986
  • [22] Detecting homogeneous segments in DNA sequences by using hidden Markov models
    Boys, RJ
    Henderson, DA
    Wilkinson, DJ
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2000, 49 : 269 - 285
  • [23] Analysis of panel data under hidden mover-stayer models
    Yi, Grace Y.
    He, Wenqing
    He, Feng
    STATISTICS IN MEDICINE, 2017, 36 (20) : 3231 - 3243
  • [24] Analysis of complex neural circuits with nonlinear multidimensional hidden state models
    Friedman, Alexander
    Slocum, Joshua F.
    Tyulmankov, Danil
    Gibb, Leif G.
    Altshuler, Alex
    Ruangwises, Suthee
    Shi, Qinru
    Arana, Sebastian E. Toro
    Beck, Dirk W.
    Sholes, Jacquelyn E. C.
    Graybiel, Ann M.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (23) : 6538 - 6543
  • [25] Application of hidden Markov models to biological data mining: A case study
    Yin, MM
    Wang, JTL
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY II, 2000, 4057 : 352 - 358
  • [26] A DIRICHLET PROCESS MIXTURE OF HIDDEN MARKOV MODELS FOR PROTEIN STRUCTURE PREDICTION
    Lennox, Kristin P.
    Dahl, David B.
    Vannucci, Marina
    Day, Ryan
    Tsai, Jerry W.
    ANNALS OF APPLIED STATISTICS, 2010, 4 (02) : 916 - 942
  • [27] Fuzzy Hidden Markov Models: A New Approach In Multiple Sequence Alignment
    Collyda, Chrysa
    Diplaris, Sotiris
    Mitkas, Pericles A.
    Maglaveras, Nicos
    Pappas, Costas
    UBIQUITY: TECHNOLOGIES FOR BETTER HEALTH IN AGING SOCIETIES, 2006, 124 : 99 - +
  • [28] Predictive downscaling based on non-homogeneous hidden Markov models
    Khalil, Abedalrazq F.
    Kwon, Hyun-Han
    Lall, Upmanu
    Kaheil, Yasir H.
    HYDROLOGICAL SCIENCES JOURNAL-JOURNAL DES SCIENCES HYDROLOGIQUES, 2010, 55 (03): : 333 - 350
  • [29] Distribution of Statistics of Hidden State Sequences Through the Sum-Product Algorithm
    Martin, Donald E. K.
    Aston, John A. D.
    METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2013, 15 (04) : 897 - 918
  • [30] Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data
    Kaehler, Benjamin D.
    Yap, Von Bing
    Huttley, Gavin A.
    GENOME BIOLOGY AND EVOLUTION, 2017, 9 (01): : 134 - 149