Probabilistic reduced K-means cluster analysis

被引:0
|
作者
Lee, Seunghoon [1 ]
Song, Juwon [1 ]
机构
[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea
关键词
cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;
D O I
10.5351/KJAS.2021.34.6.905
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [1] Cluster analysis of crude oils with k-means based on their physicochemical properties
    Sancho, A.
    Ribeiro, J. C.
    Reis, M. S.
    Martins, F. G.
    COMPUTERS & CHEMICAL ENGINEERING, 2022, 157
  • [2] Multimorbidity patterns with K-means nonhierarchical cluster analysis
    Concepción Violán
    Albert Roso-Llorach
    Quintí Foguet-Boreu
    Marina Guisado-Clavero
    Mariona Pons-Vigués
    Enriqueta Pujol-Ribera
    Jose M. Valderas
    BMC Family Practice, 19
  • [3] Strong Consistency of Reduced K-means Clustering
    Terada, Yoshikazu
    SCANDINAVIAN JOURNAL OF STATISTICS, 2014, 41 (04) : 913 - 931
  • [4] Multimorbidity patterns with K-means nonhierarchical cluster analysis
    Violan, Concepcion
    Roso-Llorach, Albert
    Foguet-Boreu, Quinti
    Guisado-Clavero, Marina
    Pons-Vigues, Mariona
    Pujol-Ribera, Enriqueta
    Valderas, Jose M.
    BMC FAMILY PRACTICE, 2018, 19
  • [5] Integration of artificial immune network and K-means for cluster analysis
    R. J. Kuo
    S. S. Chen
    W. C. Cheng
    C. Y. Tsai
    Knowledge and Information Systems, 2014, 40 : 541 - 557
  • [6] Integration of artificial immune network and K-means for cluster analysis
    Kuo, R. J.
    Chen, S. S.
    Cheng, W. C.
    Tsai, C. Y.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 40 (03) : 541 - 557
  • [7] Factorial and reduced K-means reconsidered
    Timmerman, Marieke E.
    Ceulemans, Eva
    Kiers, Henk A. L.
    Vichi, Maurizio
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (07) : 1858 - 1871
  • [8] Functional factorial K-means analysis
    Yamamoto, Michio
    Terada, Yoshikazu
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 79 : 133 - 148
  • [9] A Novel Genetic Algorithm Based k-means Algorithm for Cluster Analysis
    El-Shorbagy, M. A.
    Ayoub, A. Y.
    El-Desoky, I. M.
    Mousa, A. A.
    INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 92 - 101
  • [10] K-means based cluster analysis of residential smart meter measurements
    Al-Wakeel, Ali
    Wu, Jianzhong
    CUE 2015 - APPLIED ENERGY SYMPOSIUM AND SUMMIT 2015: LOW CARBON CITIES AND URBAN ENERGY SYSTEMS, 2016, 88 : 754 - 760