Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams

被引:71
作者
Fahy, Conor [1 ]
Yang, Shengxiang [1 ]
Gongora, Mario [1 ]
机构
[1] De Montfort Univ, Ctr Computat Intelligence, Sch Comp Sci & Informat, Leicester LE1 9BH, Leics, England
基金
英国工程与自然科学研究理事会;
关键词
Ant colony; concept drift; concept evolution; data stream clustering; density clustering;
D O I
10.1109/TCYB.2018.2822552
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A data stream is a continuously arriving sequence of data and clustering data streams requires additional considerations to traditional clustering. A stream is potentially unbounded, data points arrive online and each data point can be examined only once. This imposes limitations on available memory and processing time. Furthermore, streams can be noisy and the number of clusters in the data and their statistical properties can change over time. This paper presents an online, bio-inspired approach to clustering dynamic data streams. The proposed ant colony stream clustering (ACSC) algorithm is a density-based clustering algorithm, whereby clusters are identified as high-density areas of the feature space separated by low-density areas. ACSC identifies clusters as groups of micro-clusters. The tumbling window model is used to read a stream and rough clusters are incrementally formed during a single pass of a window. A stochastic method is employed to find these rough clusters, this is shown to significantly speeding up the algorithm with only a minor cost to performance, as compared to a deterministic approach. The rough clusters are then refined using a method inspired by the observed sorting behavior of ants. Ants pick-up and drop items based on the similarity with the surrounding items. Artificial ants sort clusters by probabilistically picking and dropping microclusters based on local density and local similarity. Clusters are summarized using their constituent micro-clusters and these summary statistics are stored offline. Experimental results show that the clustering quality of ACSC is scalable, robust to noise and favorable to leading ant clustering and stream-clustering algorithms. It also requires fewer parameters and less computational time.
引用
收藏
页码:2215 / 2228
页数:14
相关论文
共 44 条
[1]  
[Anonymous], 2003, P 29 INT C VER LARG
[2]  
[Anonymous], 1991, P FIRST INT C SIMULA
[3]  
Azzag H, 2003, IEEE C EVOL COMPUTAT, P2642
[4]   DEC: Dynamically Evolving Clustering and Its Application to Structure Identification of Evolving Fuzzy Models [J].
Baruah, Rashmi Dutta ;
Angelov, Plamen .
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (09) :1619-1631
[5]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[6]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[7]   Finding groups in data: Cluster analysis with ants [J].
Boryczka, Urszula .
APPLIED SOFT COMPUTING, 2009, 9 (01) :61-70
[8]   Density-Based Clustering over an Evolving Data Stream with Noise [J].
Cao, Feng ;
Ester, Martin ;
Qian, Weining ;
Zhou, Aoying .
PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, :328-+
[9]  
Chen YX, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P133
[10]  
de Souza V. M. A., 2015, SIAM INT C DAT MIN 2, P873, DOI [10.1137/1.9781611974010.98, DOI 10.1137/1.9781611974010.982015,SDM2015]