A k-means procedure based on a Mahalanobis type distance for clustering multivariate functional data

被引:0
作者
Andrea Martino
Andrea Ghiglietti
Francesca Ieva
Anna Maria Paganoni
机构
[1] Politecnico di Milano,MOX
[2] Livanova, Modelling and Scientific Computing, Department of Mathematics
来源
Statistical Methods & Applications | 2019年 / 28卷
关键词
Distances in ; -means algorithm; Multivariate functional data; 62H30; 62M86;
D O I
暂无
中图分类号
学科分类号
摘要
This paper proposes a clustering procedure for samples of multivariate functions in (L2(I))J\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(L^2(I))^{J}$$\end{document}, with J≥1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J\ge 1$$\end{document}. This method is based on a k-means algorithm in which the distance between the curves is measured with a metric that generalizes the Mahalanobis distance in Hilbert spaces, considering the correlation and the variability along all the components of the functional data. The proposed procedure has been studied in simulation and compared with the k-means based on other distances typically adopted for clustering multivariate functional data. In these simulations, it is shown that the k-means algorithm with the generalized Mahalanobis distance provides the best clustering performances, both in terms of mean and standard deviation of the number of misclassified curves. Finally, the proposed method has been applied to two case studies, concerning ECG signals and growth curves, where the results obtained in simulation are confirmed and strengthened.
引用
收藏
页码:301 / 322
页数:21
相关论文
共 32 条
  • [1] Boudaoud S(2010)Core shape modelling of a set of curves Comput Stat Data Anal 54 308-325
  • [2] Rix H(2005)K-means cluster analysis and Mahalanobis metrics: a problematic match or an overlooked opportunity? Stat Appl 17 1-291
  • [3] Meste O(2014)The Mahalanobis distance for functional data with applications to classification Technometrics 57 281-379
  • [4] Cerioli A(2012)Clustering curves on a reduced subspace J Comput Graph Stat 21 361-107
  • [5] Galeano P(2017)Exact tests for the means of gaussian stochastic processes Stat Prob Lett 131 102-68
  • [6] Joseph E(2017)Statistical inference for stochastic processes: two-sample hypothesis tests J Stat Plann Inference 180 49-418
  • [7] Lillo Rosa E(2013)Multivariate functional clustering for the morphological analysis of electrocardiograph curves J R Stat Soc Ser C Appl Stat 62 401-106
  • [8] Gattone SA(2014)Model-based clustering for multivariate functional data Comput Stat Data Anal 71 92-1944
  • [9] Rocci R(2003)Modes and clustering for time-warped gene expression profile data Bioinformatics 19 1937-1376
  • [10] Ghiglietti A(2009)Simultaneous curve registration and clustering for functional data Comput Stat Data Anal 53 1361-95