An evaluation of the bootstrap for model validation in mixture models

被引:14
作者
Jaki, Thomas [1 ]
Su, Ting-Li [2 ]
Kim, Minjung [3 ]
Van Horn, M. Lee [4 ]
机构
[1] Univ Lancaster, Dept Math & Stat, Lancaster LA1 4YF, England
[2] Univ Manchester, Div Dent, Manchester, Lancs, England
[3] Univ Alabama, Dept Psychol, Box 870348, Tuscaloosa, AL 35487 USA
[4] Univ New Mexico, Coll Educ, Albuquerque, NM 87131 USA
关键词
Finite mixture models; Leave-k-out cross-validation; Model validation; Nonparametric Bootstrap; Regression mixture models; FINITE MIXTURES; BAYESIAN-INFERENCE; COMPONENTS; NUMBER;
D O I
10.1080/03610918.2017.1303726
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Bootstrapping has been used as a diagnostic tool for validating model results for a wide array of statistical models. Here we evaluate the use of the non-parametric bootstrap for model validation in mixture models. We show that the bootstrap is problematic for validating the results of class enumeration and demonstrating the stability of parameter estimates in both finite mixture and regression mixture models. In only 44% of simulations did bootstrapping detect the correct number of classes in at least 90% of the bootstrap samples for a finite mixture model without any model violations. For regression mixture models and cases with violated model assumptions, the performance was even worse. Consequently, we cannot recommend the non-parametric bootstrap for validating mixture models.The cause of the problem is that when resampling is used influential individual observations have a high likelihood of being sampled many times. The presence of multiple replications of even moderately extreme observations is shown to lead to additional latent classes being extracted. To verify that these replications cause the problems we show that leave-k-out cross-validation where sub-samples taken without replacement does not suffer from the same problem.
引用
收藏
页码:1028 / 1038
页数:11
相关论文
共 27 条
  • [1] [Anonymous], 1993, An introduction to the bootstrap
  • [2] [Anonymous], 2010, MPLUS VERSION 6
  • [3] Basford KE, 1997, COMPUTATION STAT, V12, P1
  • [4] The integration of continuous and discrete latent variable models: Potential problems and promising opportunities
    Bauer, DJ
    Curran, PJ
    [J]. PSYCHOLOGICAL METHODS, 2004, 9 (01) : 3 - 29
  • [5] Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes
    Bauer, DJ
    Curran, PJ
    [J]. PSYCHOLOGICAL METHODS, 2003, 8 (03) : 338 - 363
  • [6] FINITE MIXTURE MULTILEVEL MULTIDIMENSIONAL ORDINAL IRT MODELS FOR LARGE SCALE CROSS-CULTURAL RESEARCH
    de Jong, Martijn G.
    Steenkamp, Jan-Benedict E. M.
    [J]. PSYCHOMETRIKA, 2010, 75 (01) : 3 - 32
  • [7] DiCiccio TJ, 1996, STAT SCI, V11, P189
  • [8] Using regression mixture models with non-normal data: examining an ordered polytomous approach
    George, Melissa R. W.
    Yang, Na
    Van Horn, M. Lee
    Smith, Jessalyn
    Jaki, Thomas
    Feaster, Daniel J.
    Masyn, Katherine
    Howe, George
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2013, 83 (04) : 757 - 770
  • [9] Grün B, 2008, J STAT SOFTW, V28, P1
  • [10] Grun B., 2004, COMPUSTAT 2004, P1115