Probabilistic reduced K-means cluster analysis

被引:0
|
作者
Lee, Seunghoon [1 ]
Song, Juwon [1 ]
机构
[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea
关键词
cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;
D O I
10.5351/KJAS.2021.34.6.905
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [41] Factorial k-means analysis for two-way data
    Vichi, M
    Kiers, HAL
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2001, 37 (01) : 49 - 64
  • [42] Efficient portfolio construction by means of CVaR and k-means plus plus clustering analysis: Evidence from the NYSE
    Soleymani, Fazlollah
    Vasighi, Mahdi
    INTERNATIONAL JOURNAL OF FINANCE & ECONOMICS, 2022, 27 (03) : 3679 - 3693
  • [43] Beyond k-Means plus plus : Towards better cluster exploration with geometrical information
    Ping, Yuan
    Li, Huina
    Hao, Bin
    Guo, Chun
    Wang, Baocang
    PATTERN RECOGNITION, 2024, 146
  • [44] Functional k-means inverse regression
    Wang, Guochang
    Lin, Nan
    Zhang, Baoxue
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 70 : 172 - 182
  • [45] Soft geodesic kernel K-MEANS
    Kim, Joehwan
    Shim, Kwang-Hyun
    Choi, Seungiin
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 429 - +
  • [46] Discriminative K-Means Laplacian Clustering
    Chao, Guoqing
    NEURAL PROCESSING LETTERS, 2019, 49 (01) : 393 - 405
  • [47] Feature weighting in k-means clustering
    Modha, DS
    Spangler, WS
    MACHINE LEARNING, 2003, 52 (03) : 217 - 237
  • [48] Locality Sensitive K-means Clustering
    Liu, Chlen-Liang
    Hsai, Wen-Hoar
    Chang, Tao-Hsing
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2018, 34 (01) : 289 - 305
  • [49] Selective inference for k-means clustering
    Chen, Yiqun T.
    Witten, Daniela M.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [50] DEEP CLUSTERING WITH CONCRETE K-MEANS
    Gao, Boyan
    Yang, Yongxin
    Gouk, Henry
    Hospedales, Timothy M.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4252 - 4256