Subspace clustering over high-dimensional data stream based on grid density and attribute relativity

被引:0
作者
Huang, Guoyan [1 ,2 ]
Miao, Liyun [1 ,2 ]
Ren, Jiadong [1 ,2 ]
机构
[1] College of Information Science and Engineering, Yanshan University, Qinhuangdao City
[2] The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province
来源
Advances in Information Sciences and Service Sciences | 2012年 / 4卷 / 17期
关键词
Attribute relativity; Best interesting subspace; Clustering; Grid density;
D O I
10.4156/AISS.vol4.issue17.10
中图分类号
学科分类号
摘要
The traditional clustering algorithms often fail to detect meaningful clusters in high-dimensional data space. To improve the above shortcoming, we propose GDRH-Stream, a clustering method based on the attribute relativity and grid density for high-dimensional data stream, which consists of an online component and an offline component. First, the algorithm filters out redundant attributes by computing the relative entropy. Then we define a weighted attribute relativity measure and estimate the relativity of the non-redundant attributes, and form the attribute triple. At last, the best interesting subspaces are searched by the attribute triple. On the online component, GDRH-Stream maps each data object into a grid and updates the characteristic vector of the grid. On the offline component, when a clustering request arrives, the best interesting subspaces will be generated by attribute relativity. Then the original grid structure is projected to the subspace and a new grid structure is formed. The clustering will be performed on the new grid structure by adopting an approach based on the density grid. Experimental results show that GDRH-Stream algorithm has better quality and scalability.
引用
收藏
页码:91 / 99
页数:8
相关论文
共 11 条
[1]  
Zhao Y., Cao J., Zhang C., Et al., Enhancing grid-density based clustering for high dimensional data, The Journal of Systems and Software, 2, 47, pp. 1524-1536, (2011)
[2]  
Dong W., Cui J., He H., Ren J., Clustering over High-Dimensional Data Streams Based on Grid Density and Effective Dimension, IJACT: International Journal of Advancements in Computing Technology, 3, 8, pp. 154-162, (2011)
[3]  
Guha S., Ishra N.M., Motwani R., Et al., Clustering data streams, pp. 359-366, (2000)
[4]  
Aggarwal C., Han J., Wang J., Et al., A Framework for Clustering Evolving Data Streams, Proc. 29th International Conference on Very Large Data Bases, pp. 81-92, (2003)
[5]  
Chen Y., Tu L., Density-Based Clustering for Real-Time Stream Data, Proc. 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133-142, (2007)
[6]  
Ren J., Cai B., Hu C., Clustering over Data Streams Based on Grid Density and Index Tree, JCIT: Journal of Convergence Information Technology, 6, 1, pp. 83-93, (2011)
[7]  
Qi F., Long L., Guangzhen L., Overview of Clustering Research Over Data Stream, Science Mosaic, 1, pp. 237-240, (2010)
[8]  
Aggarwal C., Han J., Wang J., Et al., A Framework for Projected Clustering of High Dimensional Data Streams, Proc. 30th International Conference on Very Large Data Bases, pp. 852-863, (2004)
[9]  
Yu-Fen S., Yan-Sheng L., A Grid-Based Subspace Clustering Algorithm for High-Dimensional Data Streams, Computer Science, 34, 4, pp. 199-221, (2007)
[10]  
Ying X., Ke-Fei L., Subspace search algorithm based on attribute relativity analysis, Journal of Chongqing University of Posts and Telecommunications, 21, 4, pp. 544-548, (2009)