Weighting hidden Markov models for maximum discrimination

被引:23
|
作者
Karchin, R [1 ]
Hughey, R [1 ]
机构
[1] Univ Calif Santa Cruz, Jack Baskin Sch Engn, Dept Comp Engn, Santa Cruz, CA 95064 USA
关键词
D O I
10.1093/bioinformatics/14.9.772
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Hidden Markov models can efficiently and automatically build statistical representations of related sequences. Unfortunately, training sets are frequently biased toward one subgroup of sequences, lending to an insufficiently general model. This work evaluates sequence weighting methods based on the maximum-discrimination idea. Results: One good method scales sequence weights by an exponential that ranges between 0.1 for the best scoring sequence and 1.0 for the worst. Experiments with a curated data set show that while training with one or Two sequences performed worse than single-sequence Probabilistic Smith-Waterman, training with five or ten sequences reduced errors by 20% and 51%, respectively. This new version of the SAM HMM suite outperforms HMMer (17% reduction over PSW for 10 training sequences), Meta-MEME (28% reduction), and unweighted SAM (31% reduction).
引用
收藏
页码:772 / 782
页数:11
相关论文
共 50 条