An expectation maximization algorithm for training hidden substitution models

被引:62
作者
Holmes, I [1 ]
Rubin, GM [1 ]
机构
[1] Univ Calif Berkeley, Howard Hughes Med Inst, Berkeley, CA 94720 USA
关键词
molecular evolution; bioinformatics; amino acid substitution rates; Markov models;
D O I
10.1006/jmbi.2002.5405
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We derive an expectation maximization algorithm for maximum-likelihood training of substitution rate matrices from multiple sequence alignments. The algorithm can be used to train hidden substitution models, where the structural context of a residue is treated as a hidden variable that can evolve over time. We used the algorithm to train hidden substitution matrices on protein alignments in the Pfam database. Measuring the accuracy of multiple alignment algorithms with reference to BAli-BASE (a database of structural reference alignments) our substitution matrices consistently outperform the PAM series, with the improvement steadily increasing as up to four hidden site classes are added. We discuss several applications of this algorithm in bioinformatics. (C) 2002 Elsevier Science Ltd.
引用
收藏
页码:753 / 764
页数:12
相关论文
共 50 条
[1]   A Markov chain Monte Carlo expectation maximization algorithm for statistical analysis of DNA sequence evolution with neighbor-dependent substitution rates [J].
Hobolth, Asger .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2008, 17 (01) :138-162
[2]   DNA sequence classification via an expectation maximization algorithm and neural networks: A case study [J].
Ma, QC ;
Wang, JTL ;
Shasha, D ;
Wu, CH .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2001, 31 (04) :468-475
[3]   Expectation Maximization of Frequent Patterns, a Specific, Local, Pattern-Based Biclustering Algorithm for Biological Datasets [J].
Moore, Erin Jessica ;
Bourlai, Thirmachos .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) :812-824
[4]   Next State Prediction Algorithm for the Avionic systems using the Hidden Markov Models [J].
Lokesh, M. R. ;
Kumaraswamy, Y. S. .
2015 International Conference on Green Computing and Internet of Things (ICGCIoT), 2015, :1618-1623
[5]   Expectation-Maximization enables Phylogenetic Dating under a Categorical Rate Model [J].
Mai, Uyen ;
Charvel, Eduardo ;
Mirarab, Siavash .
SYSTEMATIC BIOLOGY, 2024, 73 (05) :823-838
[6]   Estimation of rates-across-sites distributions in phylogenetic substitution models [J].
Susko, E ;
Field, C ;
Blouin, C ;
Roger, AJ .
SYSTEMATIC BIOLOGY, 2003, 52 (05) :594-603
[7]   Investigating the Dynamics of Prisoner’s Dilemma Models with Hidden Markov Models [J].
Pivovarchuk D.G. ;
Rovenskaya E.A. .
Computational Mathematics and Modeling, 2015, 26 (1) :120-133
[8]   Degree of entanglement in Entangled Hidden Markov Models [J].
Accardi, Luigi ;
Souissi, Abdessatar ;
Soueidi, El Gheteb ;
Rhaima, Mohamed .
CHAOS SOLITONS & FRACTALS, 2025, 196
[9]   Hidden Markov Models for Automated Protocol Learning [J].
Whalen, Sean ;
Bishop, Matt ;
Crutchfield, James P. .
SECURITY AND PRIVACY IN COMMUNICATION NETWORKS, 2010, 50 :415-428
[10]   COPULA GAUSSIAN GRAPHICAL MODELS WITH HIDDEN VARIABLES [J].
Yu, Hang ;
Dauwels, Justin ;
Wang, Xueou .
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, :2177-2180