A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data

被引:3
|
作者
Ranalli, Monia [1 ]
Rocci, Roberto [2 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
[2] Univ Tor Vergata, Rome, Italy
关键词
mixture models; reduction ordinal data; composite likelihood; STRUCTURAL EQUATION MODELS; VARIABLE SELECTION; MIXTURE-MODELS; LIKELIHOOD; EXTENSION; INVARIANCE; ANALYZERS; CRITERIA;
D O I
10.1007/s11336-017-9578-5
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the clustering problem is made more difficult by the presence of noise variables/dimensions that do not contain information about the clustering structure and could mask it. The aim of this paper is to propose a model for simultaneous clustering and dimensionality reduction of ordered categorical data able to detect the discriminative dimensions discarding the noise ones. Following the underlying response variable approach, the observed variables are considered as a discretization of underlying first-order latent continuous variables distributed as a Gaussian mixture. To recognize discriminative and noise dimensions, these variables are considered to be linear combinations of two independent sets of second-order latent variables where only one contains the information about the cluster structure while the other one contains noise dimensions. The model specification involves multidimensional integrals that make the maximum likelihood estimation cumbersome and in some cases infeasible. To overcome this issue, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. Examples of application of the proposal on real and simulated data are performed to show the effectiveness of the proposal.
引用
收藏
页码:1007 / 1034
页数:28
相关论文
共 50 条
  • [31] Model-based approach for high-dimensional non-Gaussian visual data clustering and feature weighting
    Elguebaly, Tarek
    Bouguila, Nizar
    DIGITAL SIGNAL PROCESSING, 2015, 40 : 63 - 79
  • [32] A model-based approach for simultaneous water and energy reduction in a pulp and paper mill
    Chew, Irene Mei Leng
    Foo, Dominic Chwan Yee
    Bonhivers, Jean-Christophe
    Stuart, Paul
    Alva-Argaez, Alberto
    Savulescu, Luciana Elena
    APPLIED THERMAL ENGINEERING, 2013, 51 (1-2) : 393 - 400
  • [33] Model selection for mixture-based clustering for ordinal data
    Fernandez, D.
    Arnold, R.
    AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2016, 58 (04) : 437 - 472
  • [34] Model-based clustering with dissimilarities: A Bayesian approach
    Oh, Man-Suk
    Raftery, Adrian E.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2007, 16 (03) : 559 - 585
  • [35] Model-based clustering with missing not at random data
    Sportisse, Aude
    Marbac, Matthieu
    Laporte, Fabien
    Celeux, Gilles
    Boyer, Claire
    Josse, Julie
    Biernacki, Christophe
    STATISTICS AND COMPUTING, 2024, 34 (04)
  • [36] Model-based clustering and classification of functional data
    Chamroukhi, Faicel
    Nguyen, Hien D.
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 9 (04)
  • [37] On model-based clustering of skewed matrix data
    Melnykov, Volodymyr
    Zhu, Xuwen
    JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 167 : 181 - 194
  • [38] Model-based Clustering and Classification for Data Science
    Unwin, Antony
    INTERNATIONAL STATISTICAL REVIEW, 2020, 88 (01) : 263 - 264
  • [39] Model-based clustering of array CGH data
    Shah, Sohrab P.
    Cheung, K-John, Jr.
    Johnson, Nathalie A.
    Alain, Guillaume
    Gascoyne, Randy D.
    Horsman, Douglas E.
    Ng, Raymond T.
    Murphy, Kevin P.
    BIOINFORMATICS, 2009, 25 (12) : I30 - I38
  • [40] Model-based multidimensional clustering of categorical data
    Chen, Tao
    Zhang, Nevin L.
    Liu, Tengfei
    Poon, Kin Man
    Wang, Yi
    ARTIFICIAL INTELLIGENCE, 2012, 176 (01) : 2246 - 2269