Selection of the number of clusters in functional data analysis

被引:2
|
作者
Zambom, Adriano Zanin [1 ]
Alfonso Collazos, Julian [2 ]
Dias, Ronaldo [3 ]
机构
[1] Calif State Univ Northridge, Dept Math, 18111 Nordhoff St, Northridge, CA 91330 USA
[2] New Granada Mil Univ, Dept Math, Bogot, Colombia
[3] State Univ Campinas UNICAMP, Dept Stat, Sao Paulo, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Parallelism; test statistic; K-means algorithm; ANOVA; clustering; DATA SET; MODEL; ALGORITHMS;
D O I
10.1080/00949655.2022.2053855
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Identifying the number K of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of K that correctly characterizes the features of the data is essential for building meaningful clusters. In this paper we tackle the problem of estimating the number of clusters in functional data analysis by introducing a new measure that can be used with different procedures in selecting the optimal K. The main idea is to use a combination of two test statistics, which measure the lack of parallelism and the mean distance between curves, to compute criteria such as the within and between cluster sum of squares. Simulations in challenging scenarios suggest that procedures using this measure can detect the correct number of clusters more frequently than existing methods in the literature. The application of the proposed method is illustrated on several real datasets.
引用
收藏
页码:2980 / 2998
页数:19
相关论文
共 50 条
  • [31] A cluster approach to analyze preference data: Choice of the number of clusters
    Sahmer, K
    Vigneau, E
    Qannari, EM
    FOOD QUALITY AND PREFERENCE, 2006, 17 (3-4) : 257 - 265
  • [32] Seed selection algorithm through K-means on optimal number of clusters
    Chowdhury, Kuntal
    Chaudhuri, Debasis
    Pal, Arup Kumar
    Samal, Ashok
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (13) : 18617 - 18651
  • [33] Seed selection algorithm through K-means on optimal number of clusters
    Kuntal Chowdhury
    Debasis Chaudhuri
    Arup Kumar Pal
    Ashok Samal
    Multimedia Tools and Applications, 2019, 78 : 18617 - 18651
  • [34] Categorical Data Clustering with Automatic Selection of Cluster Number
    Liao, Hai-Yong
    Ng, Michael K.
    FUZZY INFORMATION AND ENGINEERING, 2009, 1 (01) : 5 - 25
  • [35] Latent Class Cluster Analysis: Selecting the number of clusters
    Lezhnina, Olga
    Kismihok, Gabor
    METHODSX, 2022, 9
  • [36] Distance based k-means clustering algorithm for determining number of clusters for high dimensional data
    Alibuhtto, Mohamed Cassim
    Mahat, Nor Idayu
    DECISION SCIENCE LETTERS, 2020, 9 (01) : 51 - 58
  • [37] Nbclust: An R Package for Determining the Relevant Number of Clusters in a Data Set
    Charrad, Malika
    Ghazzali, Nadia
    Boiteau, Veronique
    Niknafs, Azam
    JOURNAL OF STATISTICAL SOFTWARE, 2014, 61 (06): : 1 - 36
  • [38] Sequential clustering with particle filters - Estimating the number of clusters from data
    Schubert, J
    Sidenbladh, H
    2005 7TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), VOLS 1 AND 2, 2005, : 122 - 129
  • [39] Automatic selection of the number of clusters using Bayesian clustering and sparsity-inducing priors
    Valle, Denis
    Jameel, Yusuf
    Betancourt, Brenda
    Azeria, Ermias T.
    Attias, Nina
    Cullen, Joshua
    ECOLOGICAL APPLICATIONS, 2022, 32 (03)
  • [40] Functional data clustering: a survey
    Jacques, Julien
    Preda, Cristian
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2014, 8 (03) : 231 - 255