clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R

被引:36
作者
Scrucca, Luca [1 ]
Raftery, Adrian E. [2 ]
机构
[1] Univ Perugia, Dept Econ, Via A Pascoli 20, I-06123 Perugia, Italy
[2] Univ Washington, Dept Stat, Box 354320, Seattle, WA 98195 USA
来源
JOURNAL OF STATISTICAL SOFTWARE | 2018年 / 84卷 / 01期
关键词
BIC; model-based clustering; R; subset selection;
D O I
10.18637/jss.v084.i01
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Finite mixture modeling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient estimates, a clearer interpretation and, often, improved clustering partitions. This paper describes the R package clustvarsel which performs subset selection for model-based clustering. An improved version of the Raftery and Dean (2006) methodology is implemented in the new release of the package to find the (locally) optimal subset of variables with group/cluster information in a dataset. Search over the solution space is performed using either a step-wise greedy search or a headlong algorithm. Adjustments for speeding up these algorithms are discussed, as well as a parallel implementation of the stepwise search. Usage of the package is presented through the discussion of several data examples.
引用
收藏
页码:1 / 28
页数:28
相关论文
共 46 条
  • [1] Andrews J.L., 2013, vscc: Variable selection for clustering and classification
  • [2] Variable Selection for Clustering and Classification
    Andrews, Jeffrey L.
    McNicholas, Paul D.
    [J]. JOURNAL OF CLASSIFICATION, 2014, 31 (02) : 136 - 153
  • [3] [Anonymous], 1973, Association Scientifique International du Cafe, 6th International Colloquium on Coffee Chemistry
  • [4] [Anonymous], 1967, P APRIL 18 20 1967 S, DOI DOI 10.1145/1465482.1465560
  • [5] [Anonymous], 1992, COMPUTATION STAT
  • [6] [Anonymous], R LANG ENV STAT COMP
  • [7] [Anonymous], 2015, pgmm: Parsimonious gaussian mixture models. R package version 1.2
  • [8] [Anonymous], 2012, Technical Report No. 597
  • [9] [Anonymous], 2015, FOREACH PROVIDES FOR
  • [10] MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING
    BANFIELD, JD
    RAFTERY, AE
    [J]. BIOMETRICS, 1993, 49 (03) : 803 - 821