Selection of the number of clusters in functional data analysis

被引：2

作者：

Zambom, Adriano Zanin ^{[1
]}

Alfonso Collazos, Julian ^{[2
]}

Dias, Ronaldo ^{[3
]}

机构：

[1] Calif State Univ Northridge, Dept Math, 18111 Nordhoff St, Northridge, CA 91330 USA

[2] New Granada Mil Univ, Dept Math, Bogot, Colombia

[3] State Univ Campinas UNICAMP, Dept Stat, Sao Paulo, SP, Brazil

来源：

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION | 2022年 / 92卷 / 14期

基金：

巴西圣保罗研究基金会;

关键词：

Parallelism; test statistic; K-means algorithm; ANOVA; clustering; DATA SET; MODEL; ALGORITHMS;

D O I：

10.1080/00949655.2022.2053855

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Identifying the number K of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of K that correctly characterizes the features of the data is essential for building meaningful clusters. In this paper we tackle the problem of estimating the number of clusters in functional data analysis by introducing a new measure that can be used with different procedures in selecting the optimal K. The main idea is to use a combination of two test statistics, which measure the lack of parallelism and the mean distance between curves, to compute criteria such as the within and between cluster sum of squares. Simulations in challenging scenarios suggest that procedures using this measure can detect the correct number of clusters more frequently than existing methods in the literature. The application of the proposed method is illustrated on several real datasets.

引用

页码：2980 / 2998

页数：19

共 50 条

[41] Variable selection and data fusion for diesel cetane number prediction
Buendia-Garcia, J.
Lacoue-Negre, M.
Gornay, J.
Mas-Garcia, S.
Bendoula, R.
Roger, J. M.
FUEL, 2023, 332
[42] On finding the number of clusters
Kothari, R
Pitts, D
PATTERN RECOGNITION LETTERS, 1999, 20 (04) : 405 - 416
[43] An automatic method to determine the number of clusters using decision-theoretic rough set
Yu, Hong
Liu, Zhanguo
Wang, Guoyin
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2014, 55 (01) : 101 - 115
[44] Joint selection of variables and clusters: recovering the underlying structure of marketing data
Brudvig, Susan
Brusco, Michael J.
Cradit, J. Dennis
JOURNAL OF MARKETING ANALYTICS, 2019, 7 (01) : 1 - 12
[45] Nonstationary Gaussian Process Discriminant Analysis With Variable Selection for High-Dimensional Functional Data
Yu, Weichang
Wade, Sara
Bondell, Howard D.
Azizi, Lamiae
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (02) : 588 - 600
[46] Functional data analysis in shape analysis
Epifanio, Irene
Ventura-Campos, Noelia
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (09) : 2758 - 2773
[47] Estimating the number of clusters in multivariate data by various fittings of the L-curve
Moustafa, Rida
Hadi, Ali S.
COMPUTATIONAL & APPLIED MATHEMATICS, 2025, 44 (01):
[48] Estimating the number of clusters in a numerical data set via quantization error modeling
Kolesnikov, Alexander
Trichina, Elena
Kauranne, Tuomo
PATTERN RECOGNITION, 2015, 48 (03) : 941 - 952
[49] On clustering uncertain and structured data with Wasserstein barycenters and a geodesic criterion for the number of clusters
Papayiannis, G. I.
Domazakis, G. N.
Drivaliaris, D.
Koukoulas, S.
Tsekrekos, A. E.
Yannacopoulos, A. N.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2021, 91 (13) : 2569 - 2594
[50] Local and Global Data Spread Based Index for Determining Number of Clusters in a Dataset
Riyaz, Romana
Wani, M. Arif
2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 651 - 656

← 1 2 3 4 5 →