Mixture of Networks for Clustering Categorical Data: A Penalized Composite Likelihood Approach

被引:0
|
作者
Baek, Jangsun [1 ]
Park, Jeong-Soo [1 ]
机构
[1] Chonnam Natl Univ, Dept Stat, Gwangju, South Korea
基金
新加坡国家研究基金会;
关键词
Categorical data; Model-based clustering; Networks; Penalized composite likelihood; K-MEANS ALGORITHM; DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; MODEL SELECTION; LATENT; ANALYZERS;
D O I
10.1080/00031305.2022.2141856
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
One of the challenges in clustering categorical data is the curse of dimensionality caused by the inherent sparsity of high-dimensional data, the records of which include a large number of attributes. The latent class model (LCM) assumes local independence between the variables in clusters, and is a parsimonious model-based clustering approach that has been used to circumvent the problem. The mixture of a log-linear model is more flexible but requires more parameters to be estimated. In this research, we recognize that each categorical observation can be conceived as a network with pairwise linked nodes, which are the response levels of the observation attributes. Therefore, the categorical data for clustering is considered a finite mixture of different component layer networks with distinct patterns. We apply a penalized composite likelihood approach to a finite mixture of networks for sparse multivariate categorical data to reduce the number of parameters, implement the EM algorithm to estimate the model parameters, and show that the estimates are consistent and satisfy asymptotic normality. The performance of the proposed approach is shown to be better in comparison with the conventional methods for both synthetic and real datasets.
引用
收藏
页码:259 / 273
页数:15
相关论文
共 50 条
  • [41] Categorical data clustering: A correlation-based approach for unsupervised attribute weighting
    Carbonera, Joel Luis
    Abel, Mara
    2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 259 - 263
  • [42] A New Stratified Immune Based Approach for Clustering High Dimensional Categorical Data
    Narayana, G. Surya
    Vasumathi, D.
    Prasanna, K.
    EMERGING TRENDS IN ELECTRICAL, COMMUNICATIONS AND INFORMATION TECHNOLOGIES, 2017, 394 : 141 - 150
  • [43] A fuzzy k-modes algorithm for clustering categorical data
    Huang, ZX
    Ng, MK
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) : 446 - 452
  • [44] Model-Based Clustering for Conditionally Correlated Categorical Data
    Marbac, Matthieu
    Biernacki, Christophe
    Vandewalle, Vincent
    JOURNAL OF CLASSIFICATION, 2015, 32 (02) : 145 - 175
  • [45] Model-Based Clustering for Conditionally Correlated Categorical Data
    Matthieu Marbac
    Christophe Biernacki
    Vincent Vandewalle
    Journal of Classification, 2015, 32 : 145 - 175
  • [46] An entropy-based subspace clustering algorithm for categorical data
    Carbonera, Joel Luis
    Abel, Mara
    2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 272 - 277
  • [47] Maximum likelihood clustering via normal mixture models
    McLachlan, GJ
    Peel, D
    Whiten, WJ
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 1996, 8 (02) : 105 - 111
  • [48] Penalized composite quasi-likelihood for ultrahigh dimensional variable selection
    Bradic, Jelena
    Fan, Jianqing
    Wang, Weiwei
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2011, 73 : 325 - 349
  • [49] AN EM COMPOSITE LIKELIHOOD APPROACH FOR MULTISTAGE SAMPLING OF FAMILY DATA
    Choi, Y.
    Briollais, L.
    STATISTICA SINICA, 2011, 21 (01) : 231 - 253
  • [50] Kernel Subspace Clustering Algorithm for Categorical Data
    Xu K.-P.
    Chen L.-F.
    Sun H.-J.
    Wang B.-Z.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (11): : 3492 - 3505