Model-Based Clustering for Conditionally Correlated Categorical Data

被引:0
|
作者
Matthieu Marbac
Christophe Biernacki
Vincent Vandewalle
机构
[1] Inria Lille and DGA,
[2] University Lille 1,undefined
[3] CNRS and Inria,undefined
[4] University Lille 2 and Inria,undefined
来源
Journal of Classification | 2015年 / 32卷
关键词
Categorical data; Clustering; Correlation; Expectation-Maximization algorithm; Gibbs sampler; Mixture model; Model selection.;
D O I
暂无
中图分类号
学科分类号
摘要
An extension of the latent class model is presented for clustering categorical data by relaxing the classical “class conditional independence assumption” of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model.
引用
收藏
页码:145 / 175
页数:30
相关论文
共 50 条
  • [21] On model-based clustering of skewed matrix data
    Melnykov, Volodymyr
    Zhu, Xuwen
    JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 167 : 181 - 194
  • [22] Model-based Clustering and Classification for Data Science
    Unwin, Antony
    INTERNATIONAL STATISTICAL REVIEW, 2020, 88 (01) : 263 - 264
  • [23] Model-based clustering of array CGH data
    Shah, Sohrab P.
    Cheung, K-John, Jr.
    Johnson, Nathalie A.
    Alain, Guillaume
    Gascoyne, Randy D.
    Horsman, Douglas E.
    Ng, Raymond T.
    Murphy, Kevin P.
    BIOINFORMATICS, 2009, 25 (12) : I30 - I38
  • [24] Model-based clustering for multivariate functional data
    Jacques, Julien
    Preda, Cristian
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 92 - 106
  • [25] Penalized model-based clustering of fMRI data
    Dilernia, Andrew
    Quevedo, Karina
    Camchong, Jazmin
    Lim, Kelvin
    Pan, Wei
    Zhang, Lin
    BIOSTATISTICS, 2022, 23 (03) : 825 - 843
  • [26] Cloud model-based outlier detection algorithm for categorical data
    Lei, Dajiang
    Zhang, Liping
    Zhang, Lisheng
    Lei, D. (leidj@cqupt.edu.cn), 1600, Science and Engineering Research Support Society, 20 Virginia Court, Sandy Bay, Tasmania, Australia (06): : 199 - 214
  • [27] Clustering categorical data by utilizing the correlated-force ensemble
    Chuang, KT
    Chen, MS
    Proceedings of the Fourth SIAM International Conference on Data Mining, 2004, : 269 - 278
  • [28] Clustering Categorical Data Based on Representatives
    Aranganayagi, S.
    Thangavel, K.
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 599 - +
  • [29] Model-based control chart for autoregressive and correlated data
    Loredo, EN
    Jearkpaporn, D
    Borror, CM
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2002, 18 (06) : 489 - 496
  • [30] Efficiency Based Categorical Data Clustering
    Kalaivani, K.
    Raghavendra, A. P. V.
    2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2012, : 550 - 553