Subspace clustering of high dimensional data streams

被引:5
作者
Wang, Shuyun [2 ]
Fan, Yingjie [2 ]
Zhang, Chenghong [1 ]
Xu, HeXiang [2 ]
Hao, Xiulan [2 ]
Hu, Yunfa [2 ]
机构
[1] Fudan Univ, Sch Management, Shanghai 200433, Peoples R China
[2] Fudan Univ, Dept Comp & Informat Technol, Shanghai 200433, Peoples R China
来源
7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS | 2008年
关键词
D O I
10.1109/ICIS.2008.58
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, SOStream, which is a novel algorithm of clustering over high dimensional online data stream is presented, it is based on subspace. SOStream partitions the data space into grids, and maintains a superset of all dense units in an online way. A deterministic lower and upper bound of the selectivity of each maintained units are also given. With the maintained potential dense units, SOStream is capable of discovering the clusters in different subspaces over high dimensional data stream with arbitrary shape. The experimental results on real and synthetic datasets demonstrate the effectivity of the approach.
引用
收藏
页码:165 / +
页数:2
相关论文
共 13 条
  • [1] AGARWAL C, 1988, SIGKDD, P108
  • [2] Aggarwal C. C., 2004, VLDB
  • [3] Aggarwal C.C., 2003, VLDB
  • [4] Babcock B., 2002, Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), P1, DOI DOI 10.1145/543613.543615
  • [5] BRIN R, 1997, SIGMOD, P255
  • [6] CHANG H, 2003, SIGKDD, P487
  • [7] GUHA S, 2000, P FOCS
  • [8] GUHA S, 2003, IEEE T KNOWL DATA EN, P515
  • [9] Hidber C, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P145, DOI 10.1145/304181.304195
  • [10] HIDBER C, 1998, UCBCSD981004