Mixture of Networks for Clustering Categorical Data: A Penalized Composite Likelihood Approach

被引：0

作者：

Baek, Jangsun ^{[1
]}

Park, Jeong-Soo ^{[1
]}

机构：

[1] Chonnam Natl Univ, Dept Stat, Gwangju, South Korea

来源：

AMERICAN STATISTICIAN | 2023年 / 77卷 / 03期

基金：

新加坡国家研究基金会;

关键词：

Categorical data; Model-based clustering; Networks; Penalized composite likelihood; K-MEANS ALGORITHM; DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; MODEL SELECTION; LATENT; ANALYZERS;

D O I：

10.1080/00031305.2022.2141856

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

One of the challenges in clustering categorical data is the curse of dimensionality caused by the inherent sparsity of high-dimensional data, the records of which include a large number of attributes. The latent class model (LCM) assumes local independence between the variables in clusters, and is a parsimonious model-based clustering approach that has been used to circumvent the problem. The mixture of a log-linear model is more flexible but requires more parameters to be estimated. In this research, we recognize that each categorical observation can be conceived as a network with pairwise linked nodes, which are the response levels of the observation attributes. Therefore, the categorical data for clustering is considered a finite mixture of different component layer networks with distinct patterns. We apply a penalized composite likelihood approach to a finite mixture of networks for sparse multivariate categorical data to reduce the number of parameters, implement the EM algorithm to estimate the model parameters, and show that the estimates are consistent and satisfy asymptotic normality. The performance of the proposed approach is shown to be better in comparison with the conventional methods for both synthetic and real datasets.

引用

页码：259 / 273

页数：15

共 50 条

[21] A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
Ohn Mar San
Van-Nam Huynh
Yoshiteru Nakamori
JournalofSystemsScienceandComplexity, 2003, (04) : 562 - 571
[22] Penalized model-based clustering of fMRI data
Dilernia, Andrew
Quevedo, Karina
Camchong, Jazmin
Lim, Kelvin
Pan, Wei
Zhang, Lin
BIOSTATISTICS, 2022, 23 (03) : 825 - 843
[23] Clustering categorical data streams
He, Zengyou
Xu, Xiaofei
Deng, Shengchun
Huang, Joshua Zhexue
JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2011, 11 (04) : 185 - 192
[24] A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
Iam-On, Natthakan
Boongoen, Tossapon
Garrett, Simon
Price, Chris
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (03) : 413 - 425
[25] Rough set approach for clustering categorical data using information-theoretic dependency measure
Park, In-Kyoo
Choi, Gyoo-Seok
INFORMATION SYSTEMS, 2015, 48 : 289 - 295
[26] Clustering Categorical Data via Ensembling Dissimilarity Matrices
Amiri, Saeid
Clarke, Bertrand S.
Clarke, Jennifer L.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2018, 27 (01) : 195 - 208
[27] Ordering of categorical data in hierarchical clustering
Kazimianec, Michail
DATABASES AND INFORMATION SYSTEMS, 2008, : 401 - 404
[28] Formulations of fuzzy clustering for categorical data
Umayahara, Kazutaka
Miyamoto, Sadaaki
Nakamori, Yoshiteru
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2005, 1 (01): : 83 - 94
[29] Summarizing categorical data by clustering attributes
Mampaey, Michael
Vreeken, Jilles
DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (01) : 130 - 173
[30] Summarizing categorical data by clustering attributes
Michael Mampaey
Jilles Vreeken
Data Mining and Knowledge Discovery, 2013, 26 : 130 - 173

← 1 2 3 4 5 →