Probabilistic reduced K-means cluster analysis

被引:0
|
作者
Lee, Seunghoon [1 ]
Song, Juwon [1 ]
机构
[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea
关键词
cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;
D O I
10.5351/KJAS.2021.34.6.905
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [31] A parametric k-means algorithm
    Thaddeus Tarpey
    Computational Statistics, 2007, 22
  • [32] Robust trimmed k-means
    Dorabiala, Olga
    Kutz, J. Nathan
    Aravkin, Aleksandr Y.
    PATTERN RECOGNITION LETTERS, 2022, 161 : 9 - 16
  • [33] Genetic K-means algorithm
    Krishna, K
    Murty, MN
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1999, 29 (03): : 433 - 439
  • [34] K-means inverse regression
    Setodji, CM
    Cook, RD
    TECHNOMETRICS, 2004, 46 (04) : 421 - 429
  • [35] A parametric k-means algorithm
    Tarpey, Thaddeus
    COMPUTATIONAL STATISTICS, 2007, 22 (01) : 71 - 89
  • [36] K-Means Cluster Analysis Based on Fossil Fuel Carbon Dioxide Emissions: The G20 Example
    Dogan, Seyhum
    Dogan, Ebru
    Tuzer, Mutlu
    EKOIST-JOURNAL OF ECONOMETRICS AND STATISTICS, 2022, (36):
  • [37] Greenhouse Gas Emission-Based K-Means and Hierarchical Cluster Analysis : The Case of the G20
    Tuezer, Mutlu
    Dogan, Seyhun
    EKOIST-JOURNAL OF ECONOMETRICS AND STATISTICS, 2023, (39): : 89 - 100
  • [38] k-means clustering of extremes
    Janssen, Anja
    Wan, Phyllis
    ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (01): : 1211 - 1233
  • [39] Determinants of the adherence to Mediterranean diet: application of the k-means cluster analysis profiling children in the Metropolitan City of Bari
    Facendola, Rosalia
    Ottomano Palmisano, Giovanni
    De Boni, Annalisa
    Acciani, Claudio
    Roma, Rocco
    FRONTIERS IN SUSTAINABLE FOOD SYSTEMS, 2024, 7
  • [40] WBBA-KM: A hybrid weight-based bat algorithm with the k-means algorithm for cluster analysis
    Ibrahim, Mohammed H.
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2022, 25 (01): : 65 - 73