Framework for real-time clustering over sliding windows

被引:3
作者
Badiozamany, Sobhan [1 ]
Orsborn, Kjell [1 ]
Risch, Tore [1 ]
机构
[1] Uppsala Univ, Box 337, SE-75105 Uppsala, Sweden
来源
28TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM) 2016) | 2016年
关键词
Sliding windows; Clustering; Framework;
D O I
10.1145/2949689.2949696
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering queries over sliding windows require maintaining cluster memberships that change as windows slide. To address this, the Generic 2-phase Continuous Summarization framework (G2CS) utilizes a generation based window maintenance approach where windows are maintained over different time intervals. It provides algorithm independent and efficient sliding mechanisms for clustering queries where the clustering algorithms are defined in terms of queries over cluster data represented as temporal tables. A particular challenge for real-time detection of a high number of fastly evolving clusters is efficiently supporting smooth re-clustering in real-time, i.e. to minimize the sliding time with increasing window size and decreasing strides. To efficiently support such re-clustering for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM), which maintains several generations of intermediate window instances and does not require decremental cluster maintenance. To improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing. Extensive performance evaluation on both synthetic and real data shows that G2CS scales substantially better than related approaches.
引用
收藏
页数:13
相关论文
共 27 条
  • [1] Aggarwal C.C., 2004, Proceedings of the Thirtieth International Conference on Very Large Data Bases-Volume 30, VLDB '04
  • [2] Aggarwal CC, 2008, PROC INT CONF DATA, P150, DOI 10.1109/ICDE.2008.4497423
  • [3] [Anonymous], 2003, P 29 INT C VER LARG
  • [4] Babcock B., 2003, P 22 ACM SIGMOD SIGA, P234, DOI DOI 10.1145/773153.773176
  • [5] Berchtold S, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P28
  • [6] Slider: Incremental Sliding Window Analytics
    Bhatotia, Pramod
    Acar, Umut A.
    Junqueira, Flavio P.
    Rodrigues, Rodrigo
    [J]. ACM/IFIP/USENIX MIDDLEWARE 2014, 2014, : 61 - 72
  • [7] INCREMENTAL CLUSTERING FOR DYNAMIC INFORMATION-PROCESSING
    CAN, F
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1993, 11 (02) : 143 - 164
  • [8] Cranor Chuck., 2003, ACM SIGMOD
  • [9] Ester M., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P323
  • [10] Fisher D. H., 1987, Machine Learning, V2, P139, DOI 10.1023/A:1022852608280