Selecting hidden Markov model state number with cross-validated likelihood

被引:106
作者
Celeux, Gilles [2 ]
Durand, Jean-Baptiste [1 ]
机构
[1] Grenoble Univ, Lab Jean Kuntzmann, INRIA Rhone Alpes, F-38041 Grenoble 9, France
[2] Univ Paris 11, Dept Math, INRIA Futurs, F-91405 Orsay, France
关键词
hidden Markov models; model selection; cross-validation; missing values at random; EM algorithm;
D O I
10.1007/s00180-007-0097-1
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on cross-validated likelihood criteria. Using cross-validation to assess the number of hidden states allows to circumvent the well-documented technical difficulties of the order identification problem in mixture models. Moreover, in a predictive perspective, it does not require that the sampling distribution belongs to one of the models in competition. However, computing cross-validated likelihood for hidden Markov models for which only one training sample is available, involves difficulties since the data are not independent. Two approaches are proposed to compute cross-validated likelihood for a hidden Markov model. The first one consists of using a deterministic half-sampling procedure, and the second one consists of an adaptation of the EM algorithm for hidden Markov models, to take into account randomly missing values induced by cross-validation. Numerical experiments on both simulated and real data sets compare different versions of cross-validated likelihood criterion and penalised likelihood criteria, including BIC and a penalised marginal likelihood criterion. Those numerical experiments highlight a promising behaviour of the deterministic half-sampling criterion.
引用
收藏
页码:541 / 564
页数:24
相关论文
共 31 条
[11]  
DURAND JB, 2003, THESIS U GRENOBLE 1
[12]   Hidden Markov processes [J].
Ephraim, Y ;
Merhav, N .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2002, 48 (06) :1518-1569
[13]   Model-based clustering, discriminant analysis, and density estimation [J].
Fraley, C ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) :611-631
[14]   Likelihood ratio inequalities with applications to various mixtures [J].
Gassiat, E .
ANNALES DE L INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES, 2002, 38 (06) :897-906
[15]  
Gassiat E., 2000, ESAIM-PROBAB STAT, V4, P25
[16]   BAYES FACTORS [J].
KASS, RE ;
RAFTERY, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (430) :773-795
[17]  
Keribin C., 2000, SANKHYA A, P49, DOI [10.2307/25051289, DOI 10.2307/25051289]
[18]  
MCLACHLAN G, 2000, WILEY SER PROB STAT, P1, DOI 10.1002/0471721182
[19]  
MCLACHLAN GJ, 1997, COMPUTING SCI STAT, V28, P260
[20]   DROWNING - TO TREAT OR NOT TO TREAT - AN UNANSWERABLE QUESTION [J].
MODELL, JH .
CRITICAL CARE MEDICINE, 1993, 21 (03) :313-315