HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION

被引:264
作者
BALDI, P
CHAUVIN, Y
HUNKAPILLER, T
MCCLURE, MA
机构
[1] NETID INC, SAN FRANCISCO, CA 94107 USA
[2] UNIV WASHINGTON, DEPT MOLEC BIOTECHNOL, SEATTLE, WA 98195 USA
[3] UNIV CALIF IRVINE, DEPT ECOL & EVOLUT BIOL, IRVINE, CA 92717 USA
[4] JET PROP LAB, PASADENA, CA 91109 USA
[5] STANFORD UNIV, DEPT PSYCHOL, STANFORD, CA 94025 USA
关键词
MULTIPLE SEQUENCE ALIGNMENTS; PROTEIN MODELING; ADAPTIVE ALGORITHMS; SEQUENCE CLASSIFICATION;
D O I
10.1073/pnas.91.3.1059
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences.
引用
收藏
页码:1059 / 1063
页数:5
相关论文
共 26 条
[1]  
BALDI P, 1994, NEURAL COMPUT, V6, P305
[2]  
BALDI P, 1993, ADV NEURAL INFORMATI, V5, P747
[3]   DETERMINANTS OF A PROTEIN FOLD - UNIQUE FEATURES OF THE GLOBIN AMINO-ACID-SEQUENCES [J].
BASHFORD, D ;
CHOTHIA, C ;
LESK, AM .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 196 (01) :199-216
[4]  
BELDI P, 1994, ADV NEUROL INFORMATI, V6
[5]  
CAHN SC, 1992, B MATH BIOL, V54, P563
[6]   EXPECTATION MAXIMIZATION ALGORITHM FOR IDENTIFYING PROTEIN-BINDING SITES WITH VARIABLE LENGTHS FROM UNALIGNED DNA FRAGMENTS [J].
CARDON, LR ;
STORMO, GD .
JOURNAL OF MOLECULAR BIOLOGY, 1992, 223 (01) :159-170
[7]  
CHURCHILL GA, 1989, B MATH BIOL, V51, P79
[8]  
DAYHOFF MO, 1983, METHOD ENZYMOL, V91, P524
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]   SIMILAR AMINO-ACID-SEQUENCES - CHANCE OR COMMON ANCESTRY [J].
DOOLITTLE, RF .
SCIENCE, 1981, 214 (4517) :149-159