HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION

被引:263
作者
BALDI, P
CHAUVIN, Y
HUNKAPILLER, T
MCCLURE, MA
机构
[1] NETID INC, SAN FRANCISCO, CA 94107 USA
[2] UNIV WASHINGTON, DEPT MOLEC BIOTECHNOL, SEATTLE, WA 98195 USA
[3] UNIV CALIF IRVINE, DEPT ECOL & EVOLUT BIOL, IRVINE, CA 92717 USA
[4] JET PROP LAB, PASADENA, CA 91109 USA
[5] STANFORD UNIV, DEPT PSYCHOL, STANFORD, CA 94025 USA
关键词
MULTIPLE SEQUENCE ALIGNMENTS; PROTEIN MODELING; ADAPTIVE ALGORITHMS; SEQUENCE CLASSIFICATION;
D O I
10.1073/pnas.91.3.1059
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences.
引用
收藏
页码:1059 / 1063
页数:5
相关论文
共 26 条