Hidden Markov Models with mixtures as emission distributions

被引:0
作者
Stevenn Volant
Caroline Bérard
Marie-Laure Martin-Magniette
Stéphane Robin
机构
[1] UMR 518 MIA,INRA
[2] UMR MIA,AgroParisTech
[3] UMR1165 URGV,INRA
[4] UMR URGV,UEVE
[5] ERL8196 UMR URGV,CNRS
来源
Statistics and Computing | 2014年 / 24卷
关键词
Hidden Markov models; Model-based clustering; Mixture model; Hierarchical algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
In unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric model where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the standard EM algorithm can be adapted to infer the model parameters. For the initialization step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the combining criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN.
引用
收藏
页码:493 / 504
页数:11
相关论文
共 36 条
  • [1] Baudry J.P.(2010)Combining mixture components for clustering J. Comput. Graph. Stat. 9 332-353
  • [2] Raftery A.E.(2011)Mixture model approach to compare two samples of tiling array data: chip-chip and transcriptome Stat. Appl. Genet. Mol. Biol. 10 1-22
  • [3] Celeux G.(2000)Assessing a mixture model for clustering with the integrated completed likelihood IEEE Trans. Pattern Anal. Mach. Intell. 22 719-725
  • [4] Lo K.(2010)Hidden Markov models with nonelliptically contoured state densities IEEE Trans. Pattern Anal. Mach. Intell. 32 2297-2304
  • [5] Gottardo R.(1977)Maximum likelihood from incomplete data via the em algorithm J. R. Stat. Soc. B 39 1-38
  • [6] Bérard C.(1999)Mclust: software for model-based cluster analysis J. Classif. 16 297-306
  • [7] Martin-Magniette M.-L.(2010)Methods for merging Gaussian mixture components Adv. Data Anal. Classif. 4 3-34
  • [8] Robin S.(2000)Consistent estimation of the order of mixture models Sankhya, Ser. A 62 49-66
  • [9] Biernacki C.(2005)Clustering based on a multilayer mixture model J. Comput. Graph. Stat. 14 547-568
  • [10] Celeux G.(2005)A hidden Markov model for analyzing chip-chip experiments on genome tiling arrays and its application to p53 binding sequences Bioinformatics 211 274-282