A framework for dissimilarity-based partitioning clustering of categorical time series

被引:13
作者
Garcia-Magarinos, Manuel [1 ]
Vilar, Jos A. [1 ]
机构
[1] Univ A Coruna, Dept Math, La Coruna 15071, Spain
关键词
Categorical time series; Dissimilarity-based clustering; k-Means algorithm; Estimating number of clusters; Data visualization; ALGORITHM;
D O I
10.1007/s10618-014-0357-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new framework for clustering categorical time series is proposed. In our approach, a dissimilarity-based partitioning method is considered. We suggest measuring the dissimilarity between two categorical time series by assessing both closeness of raw categorical values and proximity between dynamic behaviours. For the latter, a particular index computing the temporal correlation for categorical-valued sequences is introduced. The dissimilarity measure is then used to perform clustering by considering a modified version of the -modes algorithm specifically designed to provide with a better characterization of the clusters. Furthermore, the problem of determining the number of clusters in this framework is analyzed by comparing a range of procedures, including a prediction-based resampling method properly adjusted to deal with our dissimilarity. Several graphical devices to interpret and visualize the temporal pattern of each cluster are also provided. Performance of this clustering methodology is studied on different simulated scenarios and its effectiveness is concluded by comparison with alternative approaches. Real data use is illustrated by analyzing navigation patterns of users visiting a specific news web site.
引用
收藏
页码:466 / 502
页数:37
相关论文
共 35 条
[1]   A k-mean clustering algorithm for mixed numeric and categorical data [J].
Ahmad, Amir ;
Dey, Lipika .
DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) :503-527
[2]  
[Anonymous], 1995, PATTERN RECOGNITION
[3]   A CLUSTERING PERFORMANCE-MEASURE BASED ON FUZZY SET DECOMPOSITION [J].
BACKER, E ;
JAIN, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1981, 3 (01) :66-75
[4]   A novel attribute weighting algorithm for clustering high-dimensional categorical data [J].
Bai, Liang ;
Liang, Jiye ;
Dang, Chuangyin ;
Cao, Fuyuan .
PATTERN RECOGNITION, 2011, 44 (12) :2843-2861
[5]  
Baldi P, 2003, MODELING INTERNET TH
[6]  
Bouguessa M, 2013, DATA MIN KNOWL DISC, V24, P1
[7]   Model-based clustering and visualization of navigation patterns on a web site [J].
Cadez, I ;
Heckerman, D ;
Meek, C ;
Smyth, P ;
White, S .
DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (04) :399-424
[8]  
Caliski T., 1974, Commun. Stat.-Simul. Comput, V3, P1, DOI [10.1080/03610927408827101, DOI 10.1080/03610927408827101]
[9]   A Framework for Clustering Categorical Time-Evolving Data [J].
Cao, Fuyuan ;
Liang, Jiye ;
Bai, Liang ;
Zhao, Xingwang ;
Dang, Chuangyin .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (05) :872-882
[10]   Top-down parameter-free clustering of high-dimensional categorical data [J].
Cesario, Eugenio ;
Manco, Giuseppe ;
Ortale, Riccardo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (12) :1607-1624