Selection of the number of clusters in functional data analysis

被引：2

作者：

Zambom, Adriano Zanin ^{[1
]}

Alfonso Collazos, Julian ^{[2
]}

Dias, Ronaldo ^{[3
]}

机构：

[1] Calif State Univ Northridge, Dept Math, 18111 Nordhoff St, Northridge, CA 91330 USA

[2] New Granada Mil Univ, Dept Math, Bogot, Colombia

[3] State Univ Campinas UNICAMP, Dept Stat, Sao Paulo, SP, Brazil

来源：

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION | 2022年 / 92卷 / 14期

基金：

巴西圣保罗研究基金会;

关键词：

Parallelism; test statistic; K-means algorithm; ANOVA; clustering; DATA SET; MODEL; ALGORITHMS;

D O I：

10.1080/00949655.2022.2053855

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Identifying the number K of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of K that correctly characterizes the features of the data is essential for building meaningful clusters. In this paper we tackle the problem of estimating the number of clusters in functional data analysis by introducing a new measure that can be used with different procedures in selecting the optimal K. The main idea is to use a combination of two test statistics, which measure the lack of parallelism and the mean distance between curves, to compute criteria such as the within and between cluster sum of squares. Simulations in challenging scenarios suggest that procedures using this measure can detect the correct number of clusters more frequently than existing methods in the literature. The application of the proposed method is illustrated on several real datasets.

引用

页码：2980 / 2998

页数：19

共 50 条

[31] A cluster approach to analyze preference data: Choice of the number of clusters
Sahmer, K
Vigneau, E
Qannari, EM
FOOD QUALITY AND PREFERENCE, 2006, 17 (3-4) : 257 - 265
[32] Seed selection algorithm through K-means on optimal number of clusters
Chowdhury, Kuntal
Chaudhuri, Debasis
Pal, Arup Kumar
Samal, Ashok
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (13) : 18617 - 18651
[33] Seed selection algorithm through K-means on optimal number of clusters
Kuntal Chowdhury
Debasis Chaudhuri
Arup Kumar Pal
Ashok Samal
Multimedia Tools and Applications, 2019, 78 : 18617 - 18651
[34] Categorical Data Clustering with Automatic Selection of Cluster Number
Liao, Hai-Yong
Ng, Michael K.
FUZZY INFORMATION AND ENGINEERING, 2009, 1 (01) : 5 - 25
[35] Latent Class Cluster Analysis: Selecting the number of clusters
Lezhnina, Olga
Kismihok, Gabor
METHODSX, 2022, 9
[36] Distance based k-means clustering algorithm for determining number of clusters for high dimensional data
Alibuhtto, Mohamed Cassim
Mahat, Nor Idayu
DECISION SCIENCE LETTERS, 2020, 9 (01) : 51 - 58
[37] Nbclust: An R Package for Determining the Relevant Number of Clusters in a Data Set
Charrad, Malika
Ghazzali, Nadia
Boiteau, Veronique
Niknafs, Azam
JOURNAL OF STATISTICAL SOFTWARE, 2014, 61 (06): : 1 - 36
[38] Sequential clustering with particle filters - Estimating the number of clusters from data
Schubert, J
Sidenbladh, H
2005 7TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), VOLS 1 AND 2, 2005, : 122 - 129
[39] Automatic selection of the number of clusters using Bayesian clustering and sparsity-inducing priors
Valle, Denis
Jameel, Yusuf
Betancourt, Brenda
Azeria, Ermias T.
Attias, Nina
Cullen, Joshua
ECOLOGICAL APPLICATIONS, 2022, 32 (03)
[40] Functional data clustering: a survey
Jacques, Julien
Preda, Cristian
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2014, 8 (03) : 231 - 255

← 1 2 3 4 5 →