A Sketch-based clustering algorithm for uncertain data streams

被引:3
作者
机构
[1] School of Computer Science and Technology, Xidian University, Xi'an
[2] Software Engineering Institute, Xidian University, Xi'an
[3] College of Information Engineering, Qingdao University, Qingdao, Shandong
关键词
Clustering; Data streams; Divergence; Sketch; Uncertainty;
D O I
10.4304/jnw.8.7.1536-1542
中图分类号
学科分类号
摘要
Due to the inaccuracy and noisy, uncertainty is inherent in time series streams, and increases the complexity of streams clustering. For the continuous arriving and massive data size, efficient data storage is a crucial task for clustering uncertain data streams. With hash-compressed structure, an extended uncertain sketch and update strategy are proposed to store uncertain data streams. And based on divergence and sketch metric, a sketch based similarity is given to measure objects distances. Then with core-sets and the max-min cluster distance measure, an initial cluster centers selection algorithm is proposed to improve the quality of clustering uncertain time series streams. Finally, the effectiveness of the proposed clustering algorithm is illustrated through the experimental results. © 2013 ACADEMY PUBLISHER.
引用
收藏
页码:1536 / 1542
页数:6
相关论文
共 23 条
[1]  
Papapetrou O., Garofalakis M., Deligiannakis A., Sketch-based querying of distributed sliding-window data streams, In Proceedings of the VLDB Endowment, pp. 992-1003, (2010)
[2]  
Jiang B., Pei J., Tao Y., Lin X., Clustering uncertain data based on probability distribution similarity, IEEE Transactions On Knowledge and Data Engineering, 25, 4, (2011)
[3]  
Anceaume E., Busnel Y., Sketch *- Metric: Comparing Data Streams Via Sketching, (2012)
[4]  
Ngai W.K., Kao B., Chui C.K., Cheng R., Chau M., Yip K.Y., Efficient clustering of uncertain data, In Proceedings of the Sixth International Conference On Data Mining, ICDM'06, pp. 436-445, (2006)
[5]  
Kriegel H.-P., Pfeifle M., Density-based clustering of uncertain data, In Proceedings of the Eleventh ACM SIGKDD International Conference On Knowledge Discovery In Data Mining, pp. 672-677, (2005)
[6]  
Ester M., Kriegel H.-P., Sander J., Xu X., A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of Second Int'l Conf. Knowledge Discovery and Data Mining (KDD), (1996)
[7]  
Kriegel H.-P., Pfeifle M., Hierarchical densitybased clustering of uncertain data, Proceedings of the Fifth IEEE International Conference On Data Mining, (2005)
[8]  
Ankerst M., Breunig M.M., Kriegel H.-P., Sander J., Optics: Ordering Points to Identify the Clustering Structure, ACM SIGMOD Record, pp. 49-60, (1999)
[9]  
Ackermann M.R., Martens M., Raupach C., Swierkot K., Lammersen C., Sohler C., StreamKM++: A Clustering Algorithm for Data Streams, Journal of Experimental Algorithmics, 17, pp. 327-338, (2012)
[10]  
Tran T.T.L., Peng L., Li B., Diao Y., Liu A., PODS: A new model and processing algorithms for uncertain data streams, In Proceedings of the 2010 International Conference On Management of Data, pp. 159-170, (2010)