Research on data stream clustering algorithms

被引:36
作者
Ding, Shifei [1 ,2 ]
Wu, Fulin [1 ]
Qian, Jun [1 ]
Jia, Hongjie [1 ]
Jin, Fengxiang [3 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[3] Shandong Univ Sci & Technol, Coll Geomat, Qingdao 266590, Peoples R China
基金
中国国家自然科学基金;
关键词
Data mining; Data stream; Clustering; Data model;
D O I
10.1007/s10462-013-9398-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data stream is a potentially massive, continuous, rapid sequence of data information. It has aroused great concern and research upsurge in the field of data mining. Clustering is an effective tool of data mining, so data stream clustering will undoubtedly become the focus of the study in data stream mining. In view of the characteristic of the high dimension, dynamic, real-time, many effective data stream clustering algorithms have been proposed. In addition, data stream information are not deterministic and always exist outliers and contain noises, so developing effective data stream clustering algorithm is crucial. This paper reviews the development and trend of data stream clustering and analyzes typical data stream clustering algorithms proposed in recent years, such as Birch algorithm, Local Search algorithm, Stream algorithm and CluStream algorithm. We also summarize the latest research achievements in this field and introduce some new strategies to deal with outliers and noise data. At last, we put forward the focal points and difficulties of future research for data stream clustering.
引用
收藏
页码:593 / 600
页数:8
相关论文
共 49 条
[1]  
Aggarwal C., 2004, PROC 30 INT C VERY L, V30, P852
[2]  
Aggarwal C. C., 2003, PROC 29 INT C VERY L, P81
[3]   On high dimensional projected clustering of data streams [J].
Aggarwal, CC ;
Han, JW ;
Wang, JY ;
Yu, PS .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 10 (03) :251-273
[4]  
Aggarwal CC, 2008, P 2008 SIAM INT C DA, P483, DOI [10.1137/1.9781611972788.44, DOI 10.1137/1.9781611972788.44]
[5]  
Aggarwal CC, 2008, PROC INT CONF DATA, P150, DOI 10.1109/ICDE.2008.4497423
[6]  
[Anonymous], ACTA AUTOMATICA SINI
[7]  
[Anonymous], J COMPUT RES DEV S
[8]  
[Anonymous], P 8 ACM SIGMOD WORKS
[9]  
[Anonymous], J CHIN COMPUT SYST
[10]  
[Anonymous], P AUSTR WORKSH DAT M