Gaussian Mixture Model Clustering with Incomplete Data

被引:34
|
作者
Zhang, Yi [1 ]
Li, Miaomiao [1 ,2 ]
Wang, Siwei [1 ]
Dai, Sisi [1 ]
Luo, Lei [1 ]
Zhu, En [1 ]
Xu, Huiying [3 ,4 ]
Zhu, Xinzhong [3 ]
Yao, Chaoyun [5 ]
Zhou, Haoran [6 ]
机构
[1] NUDT, Sch Comp, Changsha, Peoples R China
[2] Changsha Univ, Changsha, Hunan, Peoples R China
[3] Zhejiang Normal Univ, Coll Math & Comp Sci, Hangzhou, Zhejiang, Peoples R China
[4] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[5] NUDT, Lab Complex Electromagnet Environm Effects Elect, Changsha, Peoples R China
[6] Chongqing Univ Technol, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
GMM; clustering; EM; incomplete data;
D O I
10.1145/3408318
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gaussian mixturemodel (GMM) clustering has been extensively studied due to its effectiveness and efficiency. Though demonstrating promising performance in various applications, it cannot effectively address the absent features among data, which is not uncommon in practical applications. In this article, different from existing approaches that first impute the absence and then perform GMM clustering tasks on the imputed data, we propose to integrate the imputation and GMM clustering into a unified learning procedure. Specifically, the missing data is filled by the result of GMM clustering, and the imputed data is then taken for GMM clustering. These two steps alternatively negotiate with each other to achieve optimum. By this way, the imputed data can best serve for GMM clustering. A two-step alternative algorithm with proved convergence is carefully designed to solve the resultant optimization problem. Extensive experiments have been conducted on eight UCI benchmark datasets, and the results have validated the effectiveness of the proposed algorithm.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] A partial order framework for incomplete data clustering
    Yahyaoui, Hamdi
    AboElfotoh, Hosam
    Shu, Yanjun
    APPLIED INTELLIGENCE, 2023, 53 (07) : 7439 - 7454
  • [32] Mixture model clustering for mixed data with missing information
    Hunt, L
    Jorgensen, M
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 41 (3-4) : 429 - 440
  • [33] Clustering compositional data using Dirichlet mixture model
    Pal, Samyajoy
    Heumann, Christian
    PLOS ONE, 2022, 17 (05):
  • [34] Exploiting Gaussian Mixture Model Clustering for Full-Duplex Transceiver Design
    Chen, Jie
    Zhang, Lin
    Liang, Ying-Chang
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2019, 67 (08) : 5802 - 5816
  • [35] Information-Theoretic Clustering for Gaussian Mixture Model via Divergence Factorization
    Duan, Jiuding
    Wang, Yan
    PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 565 - 573
  • [36] Clustering based on a multilayer mixture model
    Li, J
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2005, 14 (03) : 547 - 568
  • [37] Imputation by Gaussian Copula Model with an Application to Incomplete Customer Satisfaction Data
    Kaarik, Meelis
    Kaarik, Ene
    COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 485 - 492
  • [38] A new clustering method of gene expression data based on multivariate Gaussian mixture models
    Liu, Zhe
    Song, Yu-qing
    Xie, Cong-hua
    Tang, Zheng
    SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (02) : 359 - 368
  • [39] A new clustering method of gene expression data based on multivariate Gaussian mixture models
    Zhe Liu
    Yu-qing Song
    Cong-hua Xie
    Zheng Tang
    Signal, Image and Video Processing, 2016, 10 : 359 - 368
  • [40] Model-based clustering of Gaussian copulas for mixed data
    Marbac, Matthieu
    Biernacki, Christophe
    Vandewalle, Vincent
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (23) : 11635 - 11656