Tensor-Based Possibilistic C-Means Clustering

被引:0
作者
Benjamin, Josephine Bernadette M. [1 ]
Yang, Miin-Shen [2 ]
机构
[1] Univ Santo Tomas, Dept Math & Phys, Manila 1008, Philippines
[2] Chung Yuan Christian Univ, Dept Appl Math, Taoyuan 32023, Taiwan
关键词
Tensors; Clustering algorithms; Phase change materials; Arrays; Linear programming; Heuristic algorithms; Euclidean distance; Clustering; possibilistic C-means (PCM); tensor data; tensor decomposition; tensor distance (TD); tensor-based clustering; tensor-based PCM (TPCM); COMPONENT ANALYSIS; BIG DATA; DECOMPOSITIONS; ALGORITHMS; DISTANCE; SHIFT;
D O I
10.1109/TFUZZ.2024.3435730
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current data acquisition techniques enable the gathering and storage of extensive datasets, encompassing multidimensional arrays. Recent researchers focus on the analysis of large datasets having diverse data points. These multidimensional datasets comprise diverse data points and can be represented as tensors or multidimensional arrays. Clustering, a data analysis technique, can be used to discover and reveal latent data patterns from these datasets. The traditional clustering algorithms such as k-means, fuzzy c-means (FCM), and possibilistic c-means (PCM) pose some drawbacks in efficiently delivering high-quality clustering results for tensor or multidimensional array data. This may stem from the fact that these algorithms are primarily designed for single-view or low-array datasets, rendering them less suitable for the complexities of multidimensional arrays. In response to this challenge, this article introduces the tensor-based PCM (TPCM) algorithm. TPCM utilizes a tensor distance (TD) function as the distance metric, different from the usual Euclidean distance. The TD function evaluates the distance between data points and cluster centers by considering relationships among different coordinates. To further enhance the analysis, the canonical polyadic decomposition (CPD) method and PARAFAC2 decomposition techniques are used to restructure heterogeneous data into low-order tensors. Our experiments consider two types of datasets: multiview datasets and tensor datasets. CPD is applied for tensor data decomposition, while PARAFAC2, a CPD variant, addresses multiview data with varying feature space sizes in each view. Through comprehensive illustrations and evaluations using synthetic and real datasets, we demonstrate the superior performance of TPCM. Experimental results reveal that TPCM consistently achieves higher clustering performance compared to most existing clustering algorithms.
引用
收藏
页码:5939 / 5950
页数:12
相关论文
共 59 条