Research on Dynamic Data Streams Clustering Algorithm -Pdstream based on PCA and Density

被引:0
作者
Zheng, Mei [1 ]
Ju, Chunhua [1 ]
Rui, Zhang [1 ]
机构
[1] Zhejiang Gongshang Univ, Coll Comp Sci & Informat, Hangzhou, Zhejiang, Peoples R China
来源
ADVANCED MECHANICAL ENGINEERING, PTS 1 AND 2 | 2010年 / 26-28卷
关键词
data streams; principal component analysis; density; sliding window;
D O I
10.4028/www.scientific.net/AMM.26-28.108
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The research on data streams clustering has become a focus in the field of data streams mining. Because the number of data streams is too large, and CPU of the computer has limited memory and time, it's difficult to carry out clustering quickly and effectively. For that problem, we design an improved clustering algorithm for dynamic data streams based on principal component analysis and density. The PDStream algorithm effectively overcomes the shortcomings of the STREAM algorithm controlled by historical data and the CluStream algorithm is difficult to describe non-spherical and out "old data", resulting in huge amount of data. In the course of the experiment, we compare with the STREAM algorithm, the PDStream algorithm shows the superiority of handling mass data and the characteristics of high-quality clustering.
引用
收藏
页码:108 / 112
页数:5
相关论文
共 7 条
[1]  
Aggarwal C.C., 2004, Proceedings of the Thirtieth International Conference on Very Large Data Bases-Volume 30, VLDB '04
[2]  
Aggarwal C.C., 2003, P 2003 VLDB C, P81, DOI [DOI 10.1016/B978-012722442-8/50016-1, 10.1016/B978-, DOI 10.1016/B978]
[3]  
Chang Jian-Long, 2007, Journal of Software, V18, P905, DOI 10.1360/jos180905
[4]  
Domingos P., 2000, P KDD
[5]   Approximating a data stream for querying and estimation: Algorithms and performance evaluation [J].
Guha, S ;
Koudas, N .
18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, :567-576
[6]  
Muthukrishnan S., 2003, SODA, P413
[7]  
S Guha, 2000, FOCS 2000, P359