A proposal for robust curve clustering

被引:81
作者
García-Escudero, LA [1 ]
Gordaliza, A [1 ]
机构
[1] Univ Valladolid, Dept Estadist & Invest Operat, E-47002 Valladolid, Spain
关键词
functional data; clustering; k-means; trimmed k-means; robustness;
D O I
10.1007/s00357-005-0013-8
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Functional data sets appear in many areas of science. Although each data point may be seen as a large finite-dimensional vector it is preferable to think of them as functions, and many classical multivariate techniques have been generalized for this kind of data. A widely used technique for dealing with functional data is to choose a finite-dimensional basis and find the best projection of each curve onto this basis. Therefore, given a functional basis, an approach for doing curve clustering relies on applying the k-means methodology to the fitted basis coefficients corresponding to all the curves in the data set. Unfortunately, a serious drawback follows from the lack of robustness of k-means. Trimmed k-means clustering (Cuesta-Albertos, Gordaliza, and Matran 1997) provides a robust alternative to the use of k-means and, consequently, it may be successfully used in this functional framework. The proposed approach will be exemplified by considering cubic B-splines bases, but other bases can be applied analogously depending on the application at hand.
引用
收藏
页码:185 / 201
页数:17
相关论文
共 30 条
  • [1] Unsupervised curve clustering using B-splines
    Abraham, C
    Cornillon, PA
    Matzner-Lober, E
    Molinari, N
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2003, 30 (03) : 581 - 595
  • [2] Cadez I. V., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P140, DOI 10.1145/347090.347119
  • [3] CRELLIN N, 1997, COMPUTING SCI STAT I
  • [4] THE STRONG LAW OF LARGE NUMBERS FOR K-MEANS AND BEST POSSIBLE NETS OF BANACH VALUED RANDOM-VARIABLES
    CUESTA, JA
    MATRAN, C
    [J]. PROBABILITY THEORY AND RELATED FIELDS, 1988, 78 (04) : 523 - 534
  • [5] Cuesta-Albertos JA, 1997, ANN STAT, V25, P553
  • [6] CUESTAALBERTOS JA, 2005, IN PRESS AM MATH SOC
  • [7] De Boor C., 1978, PRACTICAL GUIDE SPLI, DOI DOI 10.1007/978-1-4612-6333-3
  • [8] DESOETE G, 1993, INFORM CLASSIFICATIO
  • [9] Eubank R.L., 1988, SPLINE SMOOTHING NON
  • [10] REPRESENTING A LARGE COLLECTION OF CURVES - A CASE FOR PRINCIPAL POINTS
    FLURY, BD
    TARPEY, T
    [J]. AMERICAN STATISTICIAN, 1993, 47 (04) : 304 - 306