An evaluation of the bootstrap for model validation in mixture models

被引：14

作者：

Jaki, Thomas ^{[1
]}

Su, Ting-Li ^{[2
]}

Kim, Minjung ^{[3
]}

Van Horn, M. Lee ^{[4
]}

机构：

[1] Univ Lancaster, Dept Math & Stat, Lancaster LA1 4YF, England

[2] Univ Manchester, Div Dent, Manchester, Lancs, England

[3] Univ Alabama, Dept Psychol, Box 870348, Tuscaloosa, AL 35487 USA

[4] Univ New Mexico, Coll Educ, Albuquerque, NM 87131 USA

来源：

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION | 2018年 / 47卷 / 04期

关键词：

Finite mixture models; Leave-k-out cross-validation; Model validation; Nonparametric Bootstrap; Regression mixture models; FINITE MIXTURES; BAYESIAN-INFERENCE; COMPONENTS; NUMBER;

D O I：

10.1080/03610918.2017.1303726

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Bootstrapping has been used as a diagnostic tool for validating model results for a wide array of statistical models. Here we evaluate the use of the non-parametric bootstrap for model validation in mixture models. We show that the bootstrap is problematic for validating the results of class enumeration and demonstrating the stability of parameter estimates in both finite mixture and regression mixture models. In only 44% of simulations did bootstrapping detect the correct number of classes in at least 90% of the bootstrap samples for a finite mixture model without any model violations. For regression mixture models and cases with violated model assumptions, the performance was even worse. Consequently, we cannot recommend the non-parametric bootstrap for validating mixture models.The cause of the problem is that when resampling is used influential individual observations have a high likelihood of being sampled many times. The presence of multiple replications of even moderately extreme observations is shown to lead to additional latent classes being extracted. To verify that these replications cause the problems we show that leave-k-out cross-validation where sub-samples taken without replacement does not suffer from the same problem.

引用

页码：1028 / 1038

页数：11

共 27 条

[1] [Anonymous], 1993, An introduction to the bootstrap
[2] [Anonymous], 2010, MPLUS VERSION 6
[3] Basford KE, 1997, COMPUTATION STAT, V12, P1
[4] The integration of continuous and discrete latent variable models: Potential problems and promising opportunities
Bauer, DJ
Curran, PJ
[J]. PSYCHOLOGICAL METHODS, 2004, 9 (01) : 3 - 29
[5] Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes
Bauer, DJ
Curran, PJ
[J]. PSYCHOLOGICAL METHODS, 2003, 8 (03) : 338 - 363
[6] FINITE MIXTURE MULTILEVEL MULTIDIMENSIONAL ORDINAL IRT MODELS FOR LARGE SCALE CROSS-CULTURAL RESEARCH
de Jong, Martijn G.
Steenkamp, Jan-Benedict E. M.
[J]. PSYCHOMETRIKA, 2010, 75 (01) : 3 - 32
[7] DiCiccio TJ, 1996, STAT SCI, V11, P189
[8] Using regression mixture models with non-normal data: examining an ordered polytomous approach
George, Melissa R. W.
Yang, Na
Van Horn, M. Lee
Smith, Jessalyn
Jaki, Thomas
Feaster, Daniel J.
Masyn, Katherine
Howe, George
[J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2013, 83 (04) : 757 - 770
[9] Grün B, 2008, J STAT SOFTW, V28, P1
[10] Grun B., 2004, COMPUSTAT 2004, P1115

← 1 2 3 →