A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data

被引:3
|
作者
Ranalli, Monia [1 ]
Rocci, Roberto [2 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
[2] Univ Tor Vergata, Rome, Italy
关键词
mixture models; reduction ordinal data; composite likelihood; STRUCTURAL EQUATION MODELS; VARIABLE SELECTION; MIXTURE-MODELS; LIKELIHOOD; EXTENSION; INVARIANCE; ANALYZERS; CRITERIA;
D O I
10.1007/s11336-017-9578-5
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the clustering problem is made more difficult by the presence of noise variables/dimensions that do not contain information about the clustering structure and could mask it. The aim of this paper is to propose a model for simultaneous clustering and dimensionality reduction of ordered categorical data able to detect the discriminative dimensions discarding the noise ones. Following the underlying response variable approach, the observed variables are considered as a discretization of underlying first-order latent continuous variables distributed as a Gaussian mixture. To recognize discriminative and noise dimensions, these variables are considered to be linear combinations of two independent sets of second-order latent variables where only one contains the information about the cluster structure while the other one contains noise dimensions. The model specification involves multidimensional integrals that make the maximum likelihood estimation cumbersome and in some cases infeasible. To overcome this issue, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. Examples of application of the proposal on real and simulated data are performed to show the effectiveness of the proposal.
引用
收藏
页码:1007 / 1034
页数:28
相关论文
共 50 条
  • [21] Model-Based Clustering of Temporal Data
    El Assaad, Hani
    Same, Allou
    Govaert, Gerard
    Aknin, Patrice
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2013, 2013, 8131 : 9 - 16
  • [22] The Clustering of Categorical Data: A Comparison of a Model-based and a Distance-based Approach
    Anderlucci, Laura
    Hennig, Christian
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2014, 43 (04) : 704 - 721
  • [23] Model-based regression clustering for high-dimensional data: application to functional data
    Devijver, Emilie
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (02) : 243 - 279
  • [24] Active Clustering with Model-Based Uncertainty Reduction
    Xiong, Caiming
    Johnson, David M.
    Corso, Jason J.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (01) : 5 - 17
  • [25] Co-clustering contaminated data: a robust model-based approach
    Fibbi, Edoardo
    Perrotta, Domenico
    Torti, Francesca
    Van Aelst, Stefan
    Verdonck, Tim
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024, 18 (01) : 121 - 161
  • [26] Co-clustering contaminated data: a robust model-based approach
    Edoardo Fibbi
    Domenico Perrotta
    Francesca Torti
    Stefan Van Aelst
    Tim Verdonck
    Advances in Data Analysis and Classification, 2024, 18 : 121 - 161
  • [27] The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering
    Fionn Murtagh
    Journal of Classification, 2009, 26 : 249 - 277
  • [28] The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering
    Murtagh, Fionn
    JOURNAL OF CLASSIFICATION, 2009, 26 (03) : 249 - 277
  • [29] Model-based clustering of high-dimensional longitudinal data via regularization
    Yang, Luoying
    Wu, Tong Tong
    BIOMETRICS, 2023, 79 (02) : 761 - 774
  • [30] Model-based approach for high-dimensional non-Gaussian visual data clustering and feature weighting
    Elguebaly, Tarek
    Bouguila, Nizar
    Digital Signal Processing: A Review Journal, 2015, 40 (01): : 63 - 79