Selection of the number of clusters in functional data analysis

被引:2
|
作者
Zambom, Adriano Zanin [1 ]
Alfonso Collazos, Julian [2 ]
Dias, Ronaldo [3 ]
机构
[1] Calif State Univ Northridge, Dept Math, 18111 Nordhoff St, Northridge, CA 91330 USA
[2] New Granada Mil Univ, Dept Math, Bogot, Colombia
[3] State Univ Campinas UNICAMP, Dept Stat, Sao Paulo, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Parallelism; test statistic; K-means algorithm; ANOVA; clustering; DATA SET; MODEL; ALGORITHMS;
D O I
10.1080/00949655.2022.2053855
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Identifying the number K of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of K that correctly characterizes the features of the data is essential for building meaningful clusters. In this paper we tackle the problem of estimating the number of clusters in functional data analysis by introducing a new measure that can be used with different procedures in selecting the optimal K. The main idea is to use a combination of two test statistics, which measure the lack of parallelism and the mean distance between curves, to compute criteria such as the within and between cluster sum of squares. Simulations in challenging scenarios suggest that procedures using this measure can detect the correct number of clusters more frequently than existing methods in the literature. The application of the proposed method is illustrated on several real datasets.
引用
收藏
页码:2980 / 2998
页数:19
相关论文
共 50 条
  • [21] DETERMINING THE OPTIMAL NUMBER OF CLUSTERS IN CLUSTER ANALYSIS
    Loster, Tomas
    10TH INTERNATIONAL DAYS OF STATISTICS AND ECONOMICS, 2016, : 1078 - 1090
  • [22] FCM-based model selection algorithms for determining the number of clusters
    Sun, HJ
    Wang, SR
    Jiang, QS
    PATTERN RECOGNITION, 2004, 37 (10) : 2027 - 2037
  • [23] Performance Analysis of WSN by varying number of clusters
    Bathla, Gaurav
    Pawar, Lokesh
    Bajaj, Rohit
    Kaur, Harjeet
    Singh, Navjot
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021, : 175 - +
  • [24] Feature selection based on functional group structure for microRNA expression data analysis
    Yang, Yang
    Cao, Tianyu
    Kong, Wei
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 242 - 247
  • [25] Automatic Determination of the Appropriate Number of Clusters for Multispectral Image Data
    Koonsanit, Kitti
    Jaruskulchai, Chuleerat
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (05): : 1256 - 1263
  • [26] Estimating the number of clusters in a data set via the gap statistic
    Tibshirani, R
    Walther, G
    Hastie, T
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2001, 63 : 411 - 423
  • [27] A Criterion for Deciding the Number of Clusters in a Dataset Based on Data Depth
    Baidari, Ishwar
    Patil, Channamma
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2020, 7 (04) : 417 - 431
  • [28] Efficient estimation of the number of clusters for high-dimension data
    Kasapis, Spiridon
    Zhang, Geng
    Smereka, Jonathon M.
    Vlahopoulos, Nickolas
    JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS, 2023,
  • [29] An evolutionary algorithm for clustering data streams with a variable number of clusters
    Silva, Jonathan de Andrade
    Hruschka, Eduardo Raul
    Gama, Joao
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 67 : 228 - 238
  • [30] Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters
    Jia, Hong
    Cheung, Yiu-Ming
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) : 3308 - 3325