Selection of the number of clusters in functional data analysis

被引:2
|
作者
Zambom, Adriano Zanin [1 ]
Alfonso Collazos, Julian [2 ]
Dias, Ronaldo [3 ]
机构
[1] Calif State Univ Northridge, Dept Math, 18111 Nordhoff St, Northridge, CA 91330 USA
[2] New Granada Mil Univ, Dept Math, Bogot, Colombia
[3] State Univ Campinas UNICAMP, Dept Stat, Sao Paulo, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Parallelism; test statistic; K-means algorithm; ANOVA; clustering; DATA SET; MODEL; ALGORITHMS;
D O I
10.1080/00949655.2022.2053855
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Identifying the number K of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of K that correctly characterizes the features of the data is essential for building meaningful clusters. In this paper we tackle the problem of estimating the number of clusters in functional data analysis by introducing a new measure that can be used with different procedures in selecting the optimal K. The main idea is to use a combination of two test statistics, which measure the lack of parallelism and the mean distance between curves, to compute criteria such as the within and between cluster sum of squares. Simulations in challenging scenarios suggest that procedures using this measure can detect the correct number of clusters more frequently than existing methods in the literature. The application of the proposed method is illustrated on several real datasets.
引用
收藏
页码:2980 / 2998
页数:19
相关论文
共 50 条
  • [1] Selection of number of clusters and warping penalty in clustering functional electrocardiogram
    Yang, Wei
    Feldman, Harold I.
    Guo, Wensheng
    STATISTICS IN MEDICINE, 2024, 43 (26) : 4913 - 4927
  • [2] Comparative analysis on the selection of number of clusters in community detection
    Kawamoto, Tatsuro
    Kabashima, Yoshiyuki
    PHYSICAL REVIEW E, 2018, 97 (02)
  • [3] Dual Criteria Determination of the Number of Clusters in Data
    Hua, Kaixun
    Simovici, Dan A.
    2018 20TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2018), 2019, : 201 - 208
  • [4] Effects of Resampling in Determining the Number of Clusters in a Data Set
    Dangl, Rainer
    Leisch, Friedrich
    JOURNAL OF CLASSIFICATION, 2020, 37 (03) : 558 - 583
  • [5] Analysis of the Optimal Number of Clusters in UDN Environment
    Kim, Eung-Hyo
    Lee, Je-Woo
    Kim, Young-Min
    Hong, Een-Kee
    2019 IEEE VTS ASIA PACIFIC WIRELESS COMMUNICATIONS SYMPOSIUM (APWCS 2019), 2019,
  • [6] Joint Cluster Analysis of Attribute and Relationship Data Without A-Priori Specification of the Number of Clusters
    Moser, Flavia
    Ge, Rong
    Ester, Martin
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 510 - 519
  • [7] Determine the number of clusters by data augmentation
    Luo, Wei
    ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (02): : 3910 - 3936
  • [8] Determining the number of clusters using information entropy for mixed data
    Liang, Jiye
    Zhao, Xingwang
    Li, Deyu
    Cao, Fuyuan
    Dang, Chuangyin
    PATTERN RECOGNITION, 2012, 45 (06) : 2251 - 2265
  • [9] A Support System for Clustering Data Streams with a Variable Number of Clusters
    Silva, Jonathan de Andrade
    Hruschka, Eduardo Raul
    ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2016, 11 (02)
  • [10] NIFTI: An evolutionary approach for finding number of clusters in microarray data
    Jonnalagadda, Sudhakar
    Srinivasan, Rajagopalan
    BMC BIOINFORMATICS, 2009, 10