StreamLeader: A New Stream Clustering Algorithm not Based in Conventional Clustering

被引:0
|
作者
Andres-Merino, Jaime [1 ]
Belanche, Lluis A. [1 ]
机构
[1] Tech Univ Catalonia, Dept Comp Sci, Jordi Girona 1-3, Barcelona 08034, Spain
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2016, PT II | 2016年 / 9887卷
关键词
Stream algorithms; Clustering; Big Data;
D O I
10.1007/978-3-319-44781-0_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stream clustering algorithms normally require two phases: an online first step that statistically summarizes the stream while forming special structures - such as micro-clusters- and a second, offline phase, that uses a conventional clustering algorithm taking the microclusters as pseudo-points to deliver the final clustering. This procedure tends to produce oversized or overlapping clusters in medium-to-high dimensional spaces, and typically degrades seriously in noisy data environments. In this paper we introduce STREAMLEADER, a novel stream clustering algorithm suitable to massive data that does not resort to a conventional clustering phase, being based on the notion of Leader Cluster and on an aggressive noise reduction process. We report an extensive systematic testing in which the new algorithm is shown to consistently outperform its contenders both in terms of quality and scalability.
引用
收藏
页码:208 / 215
页数:8
相关论文
共 50 条
  • [11] An improved data stream algorithm for clustering
    Kim, Sang-Sub
    Ahn, Hee-Kap
    COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2015, 48 (09): : 635 - 645
  • [12] An effective partitional clustering algorithm based on new clustering validity index
    Zhu, Erzhou
    Ma, Ruhui
    APPLIED SOFT COMPUTING, 2018, 71 : 608 - 621
  • [13] A Density Granularity Grid Clustering Algorithm Based on Data Stream
    Wang, Li-fang
    Han, Xie
    EMERGING RESEARCH IN WEB INFORMATION SYSTEMS AND MINING, 2011, 238 : 113 - 120
  • [14] An Incremental Algorithm Based on Irregular Grid for Clustering Data Stream
    Yin, Guisheng
    Yu, Xiang
    Yang, Guang
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 5680 - 5684
  • [15] CSBIterKmeans: A New Clustering Algorithm Based on Quantitative Assessment of the Clustering Quality
    Smaoui, Tarek
    Mueller, Sascha
    Mueller-Schloer, Christina
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 337 - 346
  • [16] Knowledge-based Evolving Clustering Algorithm for Data Stream
    Sun, Zhaoyang
    Mao, K. Z.
    Tang, Wenyin
    Mak, Lee-Onn
    Xian, Kuitong
    Liu, Ying
    2014 11TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM), 2014,
  • [17] A data stream subspace clustering algorithm based on region partition
    Yu, X. (yuxpointfly@gmail.com), 1600, Science Press (51):
  • [18] A Data Stream Clustering Algorithm Based on Density and Extended Grid
    Hua, Zheng
    Du, Tao
    Qu, Shouning
    Mou, Guodong
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2017, PT II, 2017, 10362 : 689 - 699
  • [19] Text stream clustering algorithm based on adaptive feature selection
    Gong, Linghui
    Zeng, Jianping
    Zhang, Shiyong
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 1393 - 1399
  • [20] A Clustering Algorithm Based on Density-Grid for Stream Data
    Zhang, Dandan
    Tian, Hui
    Sang, Yingpeng
    Li, Yidong
    Wu, Yanbo
    Wu, Jun
    Shen, Hong
    2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 398 - 403