An expectation maximization algorithm for training hidden substitution models

被引:62
作者
Holmes, I [1 ]
Rubin, GM [1 ]
机构
[1] Univ Calif Berkeley, Howard Hughes Med Inst, Berkeley, CA 94720 USA
关键词
molecular evolution; bioinformatics; amino acid substitution rates; Markov models;
D O I
10.1006/jmbi.2002.5405
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We derive an expectation maximization algorithm for maximum-likelihood training of substitution rate matrices from multiple sequence alignments. The algorithm can be used to train hidden substitution models, where the structural context of a residue is treated as a hidden variable that can evolve over time. We used the algorithm to train hidden substitution matrices on protein alignments in the Pfam database. Measuring the accuracy of multiple alignment algorithms with reference to BAli-BASE (a database of structural reference alignments) our substitution matrices consistently outperform the PAM series, with the improvement steadily increasing as up to four hidden site classes are added. We discuss several applications of this algorithm in bioinformatics. (C) 2002 Elsevier Science Ltd.
引用
收藏
页码:753 / 764
页数:12
相关论文
共 50 条
[31]   PartitionFinder: Combined Selection of Partitioning Schemes and Substitution Models for Phylogenetic Analyses [J].
Lanfear, Robert ;
Calcott, Brett ;
Ho, Simon Y. W. ;
Guindon, Stephane .
MOLECULAR BIOLOGY AND EVOLUTION, 2012, 29 (06) :1695-1701
[32]   Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models [J].
Yang, ZH ;
Nielsen, R .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (01) :32-43
[33]   Modeling MOOC Student Behavior With Two-Layer Hidden Markov Models [J].
Geigle, Chase ;
Zhai, ChengXiang .
PROCEEDINGS OF THE FOURTH (2017) ACM CONFERENCE ON LEARNING @ SCALE (L@S'17), 2017, :205-208
[34]   Effective hidden Markov models for detecting splicing junction sites in DNA sequences [J].
Yin, MM ;
Wang, JTL .
INFORMATION SCIENCES, 2001, 139 (1-2) :139-163
[35]   Generalized hidden Markov models - Part II: Application to handwritten word recognition [J].
Mohamed, MA ;
Gader, P .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2000, 8 (01) :82-94
[36]   Partially hidden Markov models for privacy-preserving modeling of indoor trajectories [J].
Jitta, Aditya ;
Klami, Arto .
NEUROCOMPUTING, 2017, 266 :196-205
[37]   High Speed Biological Sequence Analysis With Hidden Markov Models on Reconfigurable Platforms [J].
Oliver, Timothy F. ;
Schmidt, Bertil ;
Jakop, Yanto ;
Maskell, Douglas L. .
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2009, 13 (05) :740-746
[38]   INFLUENZA A SUBTYPING AND HOST ORIGIN CLASSIFICATION USING PROFILE HIDDEN MARKOV MODELS [J].
Sherif, Fayroz F. ;
El-Hefnawi, Mahmoud ;
Kadah, Yasser M. .
JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2012, 12 (02)
[39]   Divergence-Based Motivation for Online EM and Combining Hidden Variable Models [J].
Amid, Ehsan ;
Warmuth, Manfred K. .
CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 :81-90
[40]   Research and design of distributed training algorithm for neural networks [J].
Yang, B ;
Wang, YD ;
Su, XH .
Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, :4044-4049