Clustering Algorithm for High Dimensional Data Stream over Sliding Windows

被引:12
作者
Liu, Weiguo [1 ]
OuYang, Jia [1 ]
机构
[1] Cent South Univ, Sch Informat Sci & Engn, Changsha 410083, Hunan, Peoples R China
来源
TRUSTCOM 2011: 2011 INTERNATIONAL JOINT CONFERENCE OF IEEE TRUSTCOM-11/IEEE ICESS-11/FCST-11 | 2011年
基金
中国国家自然科学基金;
关键词
clustering algorithm; data stream; sliding window; projected clustering; exponential histogram;
D O I
10.1109/TrustCom.2011.213
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data stream clustering is confronted with great challenges due to the memory usages and the processing speed. Besides, lots of stream data are high-dimensional in natural and high-dimensional data are inherently more complex in clustering. This paper proposes an effective clustering algorithm referred as HSWStream for high dimensional data stream over sliding windows. This algorithm handles the high dimensional problem with projected clustering technique, deals with the in-cluster evolution with exponential histogram of cluster feature called EHCF and eliminates the influence of old points with the fading temporal cluster features. Meanwhile, via the mechanism of exponential histogram, we save more information of recent data but less information of old data, which is fit for the thought of data stream evolution. The projected clustering brings higher quality of clusters and higher speed of execution, while the sliding window brings higher quality and less memory usage. In addition, in order to bring more efficiency, we use a fast computational method to maintain EHCF. Main idea of the fast computational method indicates that we have no need to handle the new data point immediately until we should delete a FTCF in corresponding EHCF. The evolving data streams in the experiments use KDD-CUP' 98 and KDD-CUP' 99 real data sets and synthetic data sets. The experimental results demonstrate that proposed method is of higher quality, less memory and faster processing speed than other algorithms.
引用
收藏
页码:1537 / 1542
页数:6
相关论文
共 14 条
[1]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[2]  
[Anonymous], 2004, P 30 INT C VER LARG
[3]  
[Anonymous], SDM
[4]  
[Anonymous], 2003, P 29 INT C VER LARG
[5]  
[Anonymous], E STREAM EVOLUTION B
[6]  
Babcock B., 2002, PODS, P1, DOI [DOI 10.1145/543613.543615, 10.1145/543613.543615]
[7]  
Babcock B., 2003, P 22 ACM SIGMOD SIGA, P234, DOI DOI 10.1145/773153.773176
[8]  
Datar M, 2002, SIAM PROC S, P635
[9]  
Domingos P., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P71, DOI 10.1145/347090.347107
[10]   Testing and spot-checking of data streams [J].
Feigenbaum, J ;
Kannan, S ;
Strauss, M ;
Viswanathan, M .
ALGORITHMICA, 2002, 34 (01) :67-80