A new distance with derivative information for functional k-means clustering algorithm

被引:50
作者
Meng, Yinfeng [1 ]
Liang, Jiye [2 ]
Cao, Fuyuan [2 ]
He, Yijun [1 ]
机构
[1] Shanxi Univ, Sch Math Sci, Taiyuan 030006, Shanxi, Peoples R China
[2] Minist Educ, Key Lab Computat Intelligence & Chinese Informat, Taiyuan 030006, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Functional data; Functional k-means clustering algorithm; Cluster centroid; Derivative information; Variational theory; TIME-SERIES; CLASSIFICATION; MODELS; CURVE;
D O I
10.1016/j.ins.2018.06.035
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The functional k-means clustering algorithm is a widely used method for clustering functional data. However, with this algorithm, the derivative information is not further considered in calculating the similarity between two functional samples. In fact, the derivative information is very important for catching the trend characteristic differences among functional data. In this paper, we define a novel distance used to measure the similarity among functional samples by adding their derivative information. Furthermore, in theory, we construct cluster centroids that can minimize the objective function of the functional k-means clustering algorithm based on the proposed distance. After preprocessing functional data using three types of common basis representation techniques, we compare the clustering performance of the functional k-means clustering algorithms based on four different similarity metrics. The experiments on six data sets with class labels show the effectiveness and robustness of the functional k-means clustering algorithm with the defined distance statistically. In addition, the experimental results on three real-life data sets verify the convergence and practicability of the functional k-means clustering algorithm with the defined distance. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:166 / 185
页数:20
相关论文
共 47 条
[1]   Unsupervised curve clustering using B-splines [J].
Abraham, C ;
Cornillon, PA ;
Matzner-Lober, E ;
Molinari, N .
SCANDINAVIAN JOURNAL OF STATISTICS, 2003, 30 (03) :581-595
[2]  
Ando T., 2009, STAT METHODOL, V6, P565
[3]  
[Anonymous], 2001, SPRINGER SERIES STAT, DOI [DOI 10.1007/978-0-387-21606-5, 10.1007/978-0-387-21606-5]
[4]   Fast density clustering strategies based on the k-means algorithm [J].
Bai, Liang ;
Cheng, Xueqi ;
Liang, Jiye ;
Shen, Huawei ;
Guo, Yike .
PATTERN RECOGNITION, 2017, 71 :375-386
[5]   An evolutionary technique based on K-Means algorithm for optimal clustering in RN [J].
Bandyopadhyay, S ;
Maulik, U .
INFORMATION SCIENCES, 2002, 146 (1-4) :221-237
[6]   Functional data clustering via piecewise constant nonparametric density estimation [J].
Boulle, Marc .
PATTERN RECOGNITION, 2012, 45 (12) :4389-4401
[7]   Model-based clustering of high-dimensional data: A review [J].
Bouveyron, Charles ;
Brunet-Saumard, Camille .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 :52-78
[8]   Model-based clustering of time series in group-specific functional subspaces [J].
Bouveyron, Charles ;
Jacques, Julien .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2011, 5 (04) :281-300
[9]   An initialization method for the K-Means algorithm using neighborhood model [J].
Cao, Fuyuan ;
Liang, Jiye ;
Jiang, Guang .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 58 (03) :474-483
[10]   Efficient time series matching by wavelets [J].
Chan, KP ;
Fu, AWC .
15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, :126-133