Mixture of Networks for Clustering Categorical Data: A Penalized Composite Likelihood Approach

被引:0
|
作者
Baek, Jangsun [1 ]
Park, Jeong-Soo [1 ]
机构
[1] Chonnam Natl Univ, Dept Stat, Gwangju, South Korea
基金
新加坡国家研究基金会;
关键词
Categorical data; Model-based clustering; Networks; Penalized composite likelihood; K-MEANS ALGORITHM; DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; MODEL SELECTION; LATENT; ANALYZERS;
D O I
10.1080/00031305.2022.2141856
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
One of the challenges in clustering categorical data is the curse of dimensionality caused by the inherent sparsity of high-dimensional data, the records of which include a large number of attributes. The latent class model (LCM) assumes local independence between the variables in clusters, and is a parsimonious model-based clustering approach that has been used to circumvent the problem. The mixture of a log-linear model is more flexible but requires more parameters to be estimated. In this research, we recognize that each categorical observation can be conceived as a network with pairwise linked nodes, which are the response levels of the observation attributes. Therefore, the categorical data for clustering is considered a finite mixture of different component layer networks with distinct patterns. We apply a penalized composite likelihood approach to a finite mixture of networks for sparse multivariate categorical data to reduce the number of parameters, implement the EM algorithm to estimate the model parameters, and show that the estimates are consistent and satisfy asymptotic normality. The performance of the proposed approach is shown to be better in comparison with the conventional methods for both synthetic and real datasets.
引用
收藏
页码:259 / 273
页数:15
相关论文
共 50 条
  • [31] Efficiency Based Categorical Data Clustering
    Kalaivani, K.
    Raghavendra, A. P. V.
    2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2012, : 550 - 553
  • [32] Clustering Categorical Data Based on Representatives
    Aranganayagi, S.
    Thangavel, K.
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 599 - +
  • [33] Clustering categorical data in projected spaces
    Mohamed Bouguessa
    Data Mining and Knowledge Discovery, 2015, 29 : 3 - 38
  • [34] Fuzzy rough clustering for categorical data
    Shuliang Xu
    Shenglan Liu
    Jian Zhou
    Lin Feng
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3213 - 3223
  • [35] Weighted Topological Clustering for Categorical Data
    Rogovschi, Nicoleta
    Nadif, Mohamed
    NEURAL INFORMATION PROCESSING, PT I, 2011, 7062 : 599 - +
  • [36] Clustering categorical data in projected spaces
    Bouguessa, Mohamed
    DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (01) : 3 - 38
  • [37] Incremental Clustering for Categorical Data Using Clustering Ensemble
    Li Taoying
    Chne Yan
    Qu Lili
    Mu Xiangwei
    PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2519 - 2524
  • [38] Fuzzy rough clustering for categorical data
    Xu, Shuliang
    Liu, Shenglan
    Zhou, Jian
    Feng, Lin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3213 - 3223
  • [39] On the maximum-likelihood analysis of the general linear model in categorical data
    Paulino, CDM
    Silva, GL
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1999, 30 (02) : 197 - 204
  • [40] Likelihood analysis of joint marginal and conditional models for longitudinal categorical data
    Chen, Baojiang
    Yi, Grace Y.
    Cook, Richard J.
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2009, 37 (02): : 182 - 205