Clustering Big Data streams: recent challenges and contributions

被引:2
作者
Hassani, Marwan [1 ]
Seidl, Thomas [2 ]
机构
[1] Rhein Westfal TH Aachen, Data Management & Data Explorat Grp, D-52074 Aachen, Germany
[2] Ludwig Maximilians Univ Munchen, Database Syst Grp, D-80538 Munich, Germany
来源
IT-INFORMATION TECHNOLOGY | 2016年 / 58卷 / 04期
关键词
Subspace clustering; Big Data; data streams;
D O I
10.1515/itit-2016-0007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional clustering algorithms merely considered static data. Today's various applications and research issues in big data mining have however to deal with continuous, possibly infinite streams of data, arriving at high velocity. Web traffic data, surveillance data, sensor measurements and stock trading are only some examples of these daily-increasing applications. Since the growth of data volumes is accompanied by a similar raise in their dimensionalities, clusters cannot be expected to completely appear when considering all attributes together. Subspace clustering is a general approach that solved that issue by automatically finding the hidden clusters within different subsets of the attributes rather than considering all attributes together. In this article, novel methods for an efficient subspace clustering of high-dimensional big data streams are presented. Approaches that efficiently combine the anytime clustering concept with the stream subspace clustering paradigm are discussed. Additionally, efficient and adaptive density-based clustering algorithms are presented for high-dimensional data streams. Novel open-source assessment framework and evaluation measures are additionally presented for subspace stream clustering.
引用
收藏
页码:206 / 213
页数:8
相关论文
共 20 条
  • [1] Aggarwal CC, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P61, DOI 10.1145/304181.304188
  • [2] Aggarwal CC, 2003, PROC 29 INT C VERY L, P81, DOI 10.1016/b978-012722442-8/50016-1
  • [3] Spatiotemporal Similarity Search in 3D Motion Capture Gesture Streams
    Beecks, Christian
    Hassani, Marwan
    Hinnell, Jennifer
    Schueller, Daniel
    Brenger, Bela
    Mittelberg, Irene
    Seidl, Thomas
    [J]. ADVANCES IN SPATIAL AND TEMPORAL DATABASES (SSTD 2015), 2015, 9239 : 355 - 372
  • [4] Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
  • [5] Bifet A, 2010, J MACH LEARN RES, V11, P1601
  • [6] Density-Based Clustering over an Evolving Data Stream with Noise
    Cao, Feng
    Ester, Martin
    Qian, Weining
    Zhou, Aoying
    [J]. PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 328 - +
  • [7] Hassani Marwan, 2013, Database Systems for Advanced Applications.18th International Conference, DASFAA 2013. Proceedings, P446, DOI 10.1007/978-3-642-37450-0_33
  • [8] Hassani Marwan, 2012, Scalable Uncertainty Management. Proceedings of the 6th International Conference, SUM 2012, P311, DOI 10.1007/978-3-642-33362-0_24
  • [9] Hassani M., 2011, 2011 12th IEEE International Conference on Mobile Data Management (MDM 2011), P55, DOI 10.1109/MDM.2011.28
  • [10] Hassani M., 2014, P 26 INT C SCI STAT, P37