Gaussian Mixture Model Clustering with Incomplete Data

被引：34

作者：

Zhang, Yi ^{[1
]}

Li, Miaomiao ^{[1
,2
]}

Wang, Siwei ^{[1
]}

Dai, Sisi ^{[1
]}

Luo, Lei ^{[1
]}

Zhu, En ^{[1
]}

Xu, Huiying ^{[3
,4
]}

Zhu, Xinzhong ^{[3
]}

Yao, Chaoyun ^{[5
]}

Zhou, Haoran ^{[6
]}

机构：

[1] NUDT, Sch Comp, Changsha, Peoples R China

[2] Changsha Univ, Changsha, Hunan, Peoples R China

[3] Zhejiang Normal Univ, Coll Math & Comp Sci, Hangzhou, Zhejiang, Peoples R China

[4] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

[5] NUDT, Lab Complex Electromagnet Environm Effects Elect, Changsha, Peoples R China

[6] Chongqing Univ Technol, Chongqing, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2021年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

GMM; clustering; EM; incomplete data;

D O I：

10.1145/3408318

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Gaussian mixturemodel (GMM) clustering has been extensively studied due to its effectiveness and efficiency. Though demonstrating promising performance in various applications, it cannot effectively address the absent features among data, which is not uncommon in practical applications. In this article, different from existing approaches that first impute the absence and then perform GMM clustering tasks on the imputed data, we propose to integrate the imputation and GMM clustering into a unified learning procedure. Specifically, the missing data is filled by the result of GMM clustering, and the imputed data is then taken for GMM clustering. These two steps alternatively negotiate with each other to achieve optimum. By this way, the imputed data can best serve for GMM clustering. A two-step alternative algorithm with proved convergence is carefully designed to solve the resultant optimization problem. Extensive experiments have been conducted on eight UCI benchmark datasets, and the results have validated the effectiveness of the proposed algorithm.

引用

页数：14

共 50 条

[41] Fuzzy c-means clustering of incomplete data
Hathaway, RJ
Bezdek, JC
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2001, 31 (05): : 735 - 744
[42] Balanced longitudinal data clustering with a copula kernel mixture model
Zhang, Xi
Murphy, Orla A.
Mcnicholas, Paul D.
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2025, 53 (01):
[43] A Penalized Matrix Normal Mixture Model for Clustering Matrix Data
Heo, Jinwon
Baek, Jangsun
ENTROPY, 2021, 23 (10)
[44] Gaussian Mixture Model Clustering-Based Knock Threshold Learning in Automotive Engines
Shen, Xun
Zhang, Yahui
Sata, Kota
Shen, Tielong
IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2020, 25 (06) : 2981 - 2991
[45] Affinity Propagation Clustering with Incomplete Data
Lu, Cheng
Song, Shiji
Wu, Cheng
COMPUTATIONAL INTELLIGENCE, NETWORKED SYSTEMS AND THEIR APPLICATIONS, 2014, 462 : 239 - 248
[46] Bioinspired Hybrid and Incomplete Data Clustering
Tusell-Rey, Claudia C.
Villuendas-Rey, Yenny
Camacho-Nieto, Oscar
Salinas-Garcia, Viridiana
INTERNATIONAL JOURNAL OF COMBINATORIAL OPTIMIZATION PROBLEMS AND INFORMATICS, 2024, 15 (04): : 85 - 100
[47] Mathematical Model of Mass Spectrometry Data Based on Gaussian Mixture Models
Plechawska-Wojcik, Malgorzata
ADVANCED SCIENCE LETTERS, 2014, 20 (02) : 446 - 450
[48] Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers
Zhou, Yijia
Gallivan, Kyle A.
Barbu, Adrian
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024,
[49] A conditionally positive definite kernel function for clustering of incomplete data
Goel, Sonia
Tushir, Meena
JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2024, 45 (02): : 403 - 412
[50] CLINCH: Clustering incomplete high-dimensional data for data mining application
Cheng, ZP
Zhou, D
Wang, C
Guo, JK
Wang, W
Ding, BK
Shi, B
WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 88 - 99

← 1 2 3 4 5 →