Probabilistic reduced K-means cluster analysis

被引:0
|
作者
Lee, Seunghoon [1 ]
Song, Juwon [1 ]
机构
[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea
关键词
cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;
D O I
10.5351/KJAS.2021.34.6.905
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [21] K-means cluster analysis of characteristic patterns of allergen in different ages: Real life study
    Zhao, Lei
    Fang, Jie
    Ji, Yong
    Zhang, Yingying
    Zhou, Xin
    Yin, Junfeng
    Zhang, Min
    Bao, Wuping
    CLINICAL AND TRANSLATIONAL ALLERGY, 2023, 13 (07)
  • [22] An integrated K-means - Laplacian cluster ensemble approach for document datasets
    Xu, Sen
    Chan, Kung-Sik
    Gao, Jun
    Xu, Xiufang
    Li, Xianfeng
    Hua, Xiaopeng
    An, Jing
    NEUROCOMPUTING, 2016, 214 : 495 - 507
  • [23] Balanced k-Means
    Tai, Chen-Ling
    Wang, Chen-Shu
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2017), PT II, 2017, 10192 : 75 - 82
  • [24] Reduced k-means clustering with MCA in a low-dimensional space
    Mitsuhiro, Masaki
    Yadohisa, Hiroshi
    COMPUTATIONAL STATISTICS, 2015, 30 (02) : 463 - 475
  • [25] A review of cluster analysis techniques and their uses in library and information science research: k-means and k-medoids clustering
    Lund, Brady
    Ma, Jinxuan
    PERFORMANCE MEASUREMENT AND METRICS, 2021, 22 (03) : 161 - 173
  • [26] Quantum-inspired ant lion optimized hybrid k-means for cluster analysis and intrusion detection
    Chen, Junwen
    Qi, Xuemei
    Chen, Linfeng
    Chen, Fulong
    Cheng, Guihua
    KNOWLEDGE-BASED SYSTEMS, 2020, 203
  • [27] Using k-means Cluster Analysis to Identify Symptom-Based Subtypes of Patients with Agoraphobic Fear
    Hoferichter, Esther
    Schmidt, Ruth
    Hoefler, Michael
    Hoyer, Juergen
    Rottstaedt, Fabian
    Weidner, Kerstin
    Noack, Rene
    ZEITSCHRIFT FUR KLINISCHE PSYCHOLOGIE UND PSYCHOTHERAPIE, 2019, 48 (03): : 130 - 141
  • [28] K-means analysis of construction projects in port waterfronts
    Ansorena I.L.
    International Journal of Applied Decision Sciences, 2023, 16 (05) : 525 - 544
  • [29] Comparative Analysis of K-Means Variants Implemented in R
    Nely Almanza-Ortega, Nelva
    Perez-Ortega, Joaquin
    Crispin Zavala-Diaz, Jose
    Solis-Romero, Jose
    COMPUTACION Y SISTEMAS, 2022, 26 (01): : 125 - 133
  • [30] Functional Projection K-means
    Rocci, Roberto
    Gattone, Stefano A.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024,