Divisive clustering of high dimensional data streams

被引:3
作者
Hofmeyr, David P. [1 ]
Pavlidis, Nicos G. [2 ]
Eckley, Idris A. [1 ]
机构
[1] Univ Lancaster, Dept Math & Stat, Lancaster LA1 4YF, England
[2] Univ Lancaster, Dept Management Sci, Lancaster LA1 4YX, England
基金
英国工程与自然科学研究理事会;
关键词
Clustering; Data stream; High dimensionality; Population drift; Modality testing; DENSITY; TREE;
D O I
10.1007/s11222-015-9597-y
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity.
引用
收藏
页码:1101 / 1120
页数:20
相关论文
共 50 条
[1]  
Aggarwal C.C., 2004, Proceedings of the Thirtieth International Conference on Very Large Data Bases-Volume 30, VLDB '04
[2]  
Aggarwal CC, 2003, P 2003 VLDB C, V29, P81, DOI DOI 10.1016/B978-012722442-8/50016-1
[3]  
Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P457
[4]  
Amini A., 2014, Proceedings of the First International Conference on Advanced Data and Information Engineering, P675
[5]   On Density-Based Data Streams Clustering Algorithms: A Survey [J].
Amini, Amineh ;
Teh, Ying Wah ;
Saboohi, Hadi .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2014, 29 (01) :116-141
[6]   Online linear and quadratic discriminant analysis with adaptive forgetting for streaming classification [J].
Anagnostopoulos, Christoforos ;
Tasoulis, Dimitris K. ;
Adams, Niall M. ;
Pavlidis, Nicos G. ;
Hand, David J. .
Statistical Analysis and Data Mining, 2012, 5 (02) :139-166
[7]  
Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
[8]  
[Anonymous], 2010, CLUSTERING STABILITY
[9]  
Artac M, 2002, INT C PATT RECOG, P781, DOI 10.1109/ICPR.2002.1048133
[10]   Clustering via nonparametric density estimation [J].
Azzalini, Adelchi ;
Torelli, Nicola .
STATISTICS AND COMPUTING, 2007, 17 (01) :71-80