Probabilistic reduced K-means cluster analysis

被引：0

作者：

Lee, Seunghoon ^{[1
]}

Song, Juwon ^{[1
]}

机构：

[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea

来源：

KOREAN JOURNAL OF APPLIED STATISTICS | 2021年 / 34卷 / 06期

关键词：

cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;

D O I：

10.5351/KJAS.2021.34.6.905

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.

引用

页码：905 / 922

页数：18

共 50 条

[21] K-means cluster analysis of characteristic patterns of allergen in different ages: Real life study
Zhao, Lei
Fang, Jie
Ji, Yong
Zhang, Yingying
Zhou, Xin
Yin, Junfeng
Zhang, Min
Bao, Wuping
CLINICAL AND TRANSLATIONAL ALLERGY, 2023, 13 (07)
[22] An integrated K-means - Laplacian cluster ensemble approach for document datasets
Xu, Sen
Chan, Kung-Sik
Gao, Jun
Xu, Xiufang
Li, Xianfeng
Hua, Xiaopeng
An, Jing
NEUROCOMPUTING, 2016, 214 : 495 - 507
[23] Balanced k-Means
Tai, Chen-Ling
Wang, Chen-Shu
INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2017), PT II, 2017, 10192 : 75 - 82
[24] Reduced k-means clustering with MCA in a low-dimensional space
Mitsuhiro, Masaki
Yadohisa, Hiroshi
COMPUTATIONAL STATISTICS, 2015, 30 (02) : 463 - 475
[25] A review of cluster analysis techniques and their uses in library and information science research: k-means and k-medoids clustering
Lund, Brady
Ma, Jinxuan
PERFORMANCE MEASUREMENT AND METRICS, 2021, 22 (03) : 161 - 173
[26] Quantum-inspired ant lion optimized hybrid k-means for cluster analysis and intrusion detection
Chen, Junwen
Qi, Xuemei
Chen, Linfeng
Chen, Fulong
Cheng, Guihua
KNOWLEDGE-BASED SYSTEMS, 2020, 203
[27] Using k-means Cluster Analysis to Identify Symptom-Based Subtypes of Patients with Agoraphobic Fear
Hoferichter, Esther
Schmidt, Ruth
Hoefler, Michael
Hoyer, Juergen
Rottstaedt, Fabian
Weidner, Kerstin
Noack, Rene
ZEITSCHRIFT FUR KLINISCHE PSYCHOLOGIE UND PSYCHOTHERAPIE, 2019, 48 (03): : 130 - 141
[28] K-means analysis of construction projects in port waterfronts
Ansorena I.L.
International Journal of Applied Decision Sciences, 2023, 16 (05) : 525 - 544
[29] Comparative Analysis of K-Means Variants Implemented in R
Nely Almanza-Ortega, Nelva
Perez-Ortega, Joaquin
Crispin Zavala-Diaz, Jose
Solis-Romero, Jose
COMPUTACION Y SISTEMAS, 2022, 26 (01): : 125 - 133
[30] Functional Projection K-means
Rocci, Roberto
Gattone, Stefano A.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024,

← 1 2 3 4 5 →