A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data

被引：3

作者：

Ranalli, Monia ^{[1
]}

Rocci, Roberto ^{[2
]}

机构：

[1] Penn State Univ, University Pk, PA 16802 USA

[2] Univ Tor Vergata, Rome, Italy

来源：

PSYCHOMETRIKA | 2017年 / 82卷 / 04期

关键词：

mixture models; reduction ordinal data; composite likelihood; STRUCTURAL EQUATION MODELS; VARIABLE SELECTION; MIXTURE-MODELS; LIKELIHOOD; EXTENSION; INVARIANCE; ANALYZERS; CRITERIA;

D O I：

10.1007/s11336-017-9578-5

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the clustering problem is made more difficult by the presence of noise variables/dimensions that do not contain information about the clustering structure and could mask it. The aim of this paper is to propose a model for simultaneous clustering and dimensionality reduction of ordered categorical data able to detect the discriminative dimensions discarding the noise ones. Following the underlying response variable approach, the observed variables are considered as a discretization of underlying first-order latent continuous variables distributed as a Gaussian mixture. To recognize discriminative and noise dimensions, these variables are considered to be linear combinations of two independent sets of second-order latent variables where only one contains the information about the cluster structure while the other one contains noise dimensions. The model specification involves multidimensional integrals that make the maximum likelihood estimation cumbersome and in some cases infeasible. To overcome this issue, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. Examples of application of the proposal on real and simulated data are performed to show the effectiveness of the proposal.

引用

页码：1007 / 1034

页数：28

共 50 条

[11] Model-based simultaneous clustering and ordination of multivariate abundance data in ecology
Hui, Francis K. C.
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 105 : 1 - 10
[12] Dimension reduction for model-based clustering
Luca Scrucca
Statistics and Computing, 2010, 20 : 471 - 484
[13] Dimension reduction for model-based clustering
Scrucca, Luca
STATISTICS AND COMPUTING, 2010, 20 (04) : 471 - 484
[14] A mixture model-based approach to the clustering of exponential repeated data
Martinez, M. J.
Lavergne, C.
Trottier, C.
JOURNAL OF MULTIVARIATE ANALYSIS, 2009, 100 (09) : 1938 - 1951
[15] A mixture model-based approach to the clustering of microarray expression data
McLachlan, GJ
Bean, RW
Peel, D
BIOINFORMATICS, 2002, 18 (03) : 413 - 422
[16] Model-based Co-clustering for High Dimensional Sparse Data
Salah, Aghiles
Rogovschi, Nicoleta
Nadif, Mohamed
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 866 - 874
[17] A model-based approach to sequence clustering
Binsztok, H
Artières, T
Gallinari, P
ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 420 - 424
[18] Model-based clustering of longitudinal data
McNicholas, Paul D.
Murphy, T. Brendan
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2010, 38 (01): : 153 - 168
[19] Boosting for model-based data clustering
Saffari, Amir
Bischof, Horst
PATTERN RECOGNITION, 2008, 5096 : 51 - 60
[20] Model-based clustering for longitudinal data
De la Cruz-Mesia, Rolando
Quintanab, Fernando A.
Marshall, Guillermo
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (03) : 1441 - 1457

← 1 2 3 4 5 →