Model-based clustering with envelopes

被引:3
作者
Wang, Wenjing [1 ]
Zhang, Xin [1 ]
Mai, Qing [1 ]
机构
[1] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
基金
美国国家科学基金会;
关键词
Clustering; computational statistics; dimension reduction; envelope methods; Gaussian mixture models; SUFFICIENT DIMENSION REDUCTION; DISCRIMINANT-ANALYSIS; EM ALGORITHM; MIXTURE; CONVERGENCE;
D O I
10.1214/19-EJS1652
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Clustering analysis is an important unsupervised learning technique in multivariate statistics and machine learning. In this paper, we propose a set of new mixture models called CLEMM (in short for Clustering with Envelope Mixture Models) that is based on the widely used Gaussian mixture model assumptions and the nascent research area of envelope methodology. Formulated mostly for regression models, envelope methodology aims for simultaneous dimension reduction and efficient parameter estimation, and includes a very recent formulation of envelope discriminant subspace for classification and discriminant analysis. Motivated by the envelope discriminant subspace pursuit in classification, we consider parsimonious probabilistic mixture models where the cluster analysis can be improved by projecting the data onto a latent lower-dimensional subspace. The proposed CLEMM framework and the associated envelope-EM algorithms thus provide foundations for envelope methods in unsupervised and semi-supervised learning problems. Numerical studies on simulated data and two benchmark data sets show significant improvement of our propose methods over the classical methods such as Gaussian mixture models, K-means and hierarchical clustering algorithms. An R package is available at https://github.com/kusakehan/CLEMM.
引用
收藏
页码:82 / 109
页数:28
相关论文
共 44 条
[1]  
Akaike H, 1998, Springer Series in Statistics, P199, DOI [DOI 10.1007/978-1-4612-1694-0_15, DOI 10.1007/978-1-4612-1694-015]
[2]  
[Anonymous], 2004, NEW DIRECTIONS STAT
[3]   Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data [J].
Baek, Jangsun ;
McLachlan, Geoffrey J. ;
Flack, Lloyd K. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (07) :1298-1309
[4]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[5]  
BOYLES RA, 1983, J ROY STAT SOC B MET, V45, P47
[6]   CHIME: CLUSTERING OF HIGH-DIMENSIONAL GAUSSIAN MIXTURES WITH EM ALGORITHM AND ITS OPTIMALITY [J].
Cai, T. Tony ;
Ma, Jing ;
Zhang, Linjun .
ANNALS OF STATISTICS, 2019, 47 (03) :1234-1267
[7]   Mode-finding for mixtures of Gaussian distributions [J].
Carreira-Perpiñán, MA .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (11) :1318-1323
[8]   Splitting Methods for Convex Clustering [J].
Chi, Eric C. ;
Lange, Kenneth .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2015, 24 (04) :994-1013
[9]   Envelopes and partial least squares regression [J].
Cook, R. D. ;
Helland, I. S. ;
Su, Z. .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2013, 75 (05) :851-877
[10]  
Cook R. D., 2018, DIMENSION REDUCTION