SNCStream: A Social Network-based Data Stream Clustering Algorithm

被引:15
作者
Barddal, Jean Paul [1 ]
Gomes, Heitor Murilo [1 ]
Enembreck, Fabricio [1 ]
机构
[1] Pontificia Univ Catolica Parana, Programa Posgrad Informat, Curitiba, Parana, Brazil
来源
30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II | 2015年
关键词
Data Stream Clustering; Concept Drift; Novelty Detection; Social Network Analysis;
D O I
10.1145/2695664.2695674
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data Stream Clustering is an active area of research which requires efficient algorithms capable of finding and updating clusters incrementally. On top of that, due to the inherent evolving nature of data streams, it is expected that these algorithms manage to quickly adapt to both concept drifts and the appearance and disappearance of clusters. Nevertheless, many of the developed two-step algorithms are only capable of finding hyper-spherical clusters and are highly dependant on parametrization. In this paper we introduce SNCStream, a one-step online clustering algorithm based on Social Networks Theory, which uses homophily to find non-hyper-spherical clusters. Our empirical studies show that SNCStream is able to surpass density-based algorithms in cluster quality and requires feasible amount of resources (time and memory) when compared to other algorithms.
引用
收藏
页码:935 / 940
页数:6
相关论文
共 23 条
[1]  
Abdi Herve, 2007, Encyclopedia of measurement and statistics, P103, DOI DOI 10.4135/9781412952644
[2]  
Aggarwal C. C., 2003, P 2003 ACM SIGMOD IN, P575, DOI DOI 10.1145/872757.872826
[3]  
Albert R., 2002, REV MOD PHYS, P139
[4]   On Density-Based Data Streams Clustering Algorithms: A Survey [J].
Amini, Amineh ;
Teh, Ying Wah ;
Saboohi, Hadi .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2014, 29 (01) :116-141
[5]  
[Anonymous], 2003, P 29 INT C VER LARG
[6]  
[Anonymous], 2010, GRAPH THEORY COMPLEX
[7]  
Bifet A, 2010, FRONT ARTIF INTEL AP, V207, P1, DOI 10.3233/978-1-60750-472-6-i
[8]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[9]   Density-Based Clustering over an Evolving Data Stream with Noise [J].
Cao, Feng ;
Ester, Martin ;
Qian, Weining ;
Zhou, Aoying .
PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, :328-+
[10]  
Corder G.W., 2011, Nonparametric statistics for non-statisticians