Addressing overfitting and underfitting in Gaussian model-based clustering

被引:31
作者
Andrews, Jeffrey L. [1 ]
机构
[1] Univ British Columbia, Irving K Barber Sch Arts & Sci, Dept Stat, Okanagan Campus,1177 Res Rd, Kelowna, BC V1V 1V7, Canada
基金
加拿大创新基金会; 加拿大自然科学与工程研究理事会;
关键词
EM algorithm; Bootstrap; Cluster analysis; Mixture models; HIGH-DIMENSIONAL DATA; MAXIMUM-LIKELIHOOD; EM ALGORITHM; FINITE MIXTURE; FIRE BEHAVIOR; MULTIVARIATE; BOOTSTRAP; VALUES;
D O I
10.1016/j.csda.2018.05.015
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The expectation-maximization (EM) algorithm is a common approach for parameter estimation in the context of cluster analysis using finite mixture models. This approach suffers from the well-known issue of convergence to local maxima, but also the less obvious problem of overfitting. These combined, and competing, concerns are illustrated through simulation and then addressed by introducing an algorithm that augments the traditional EM with the nonparametric bootstrap. Further simulations and applications to real data lend support for the usage of this bootstrap augmented EM-style algorithm to avoid both overfitting and local maxima. (c) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:160 / 171
页数:12
相关论文
共 51 条
  • [1] Using evolutionary algorithms for model-based clustering
    Andrews, Jeffrey L.
    McNicholas, Paul D.
    [J]. PATTERN RECOGNITION LETTERS, 2013, 34 (09) : 987 - 992
  • [2] [Anonymous], NOR10174 NO FOR RES
  • [3] [Anonymous], 2008, EM ALGORITHM EXTENSI
  • [4] [Anonymous], ST103 SCI SUST DEV D
  • [5] [Anonymous], 2004, FINITE MIXTURE MODEL
  • [6] [Anonymous], 2016, Mixture model-based classification
  • [7] [Anonymous], 1985, Statistical Analysis of Finite Mixture Distributions
  • [8] [Anonymous], FINITE MIXTURE MARKO
  • [9] [Anonymous], 2015, clusterGeneration: random cluster generation (with Specified Degree of Separation)
  • [10] Attias H, 1999, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, P21