RT-DBSCAN: Real-Time Parallel Clustering of Spatio-Temporal Data Using Spark-Streaming

被引:9
作者
Gong, Yikai [1 ]
Sinnott, Richard O. [1 ]
Rimba, Paul [2 ]
机构
[1] Univ Melbourne, Melbourne, Vic, Australia
[2] CSIRO, Data61, Sydney, NSW, Australia
来源
COMPUTATIONAL SCIENCE - ICCS 2018, PT I | 2018年 / 10860卷
关键词
DBSCAN; Clustering; Real-time systems; ALGORITHM;
D O I
10.1007/978-3-319-93698-7_40
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Clustering algorithms are essential for many big data applications involving point-based data, e.g. user generated social media data from platforms such as Twitter. One of the most common approaches for clustering is DBSCAN. However, DBSCAN has numerous limitations. The algorithm itself is based on traversing the whole dataset and identifying the neighbours around each point. This approach is not suitable when data is created and streamed in real-time however. Instead a more dynamic approach is required. This paper presents a new approach, RT-DBSCAN, that supports real-time clustering of data based on continuous cluster checkpointing. This approach overcomes many of the issues of existing clustering algorithms such as DBSCAN. The platform is realised using Apache Spark running over large-scale Cloud resources and container based technologies to support scaling. We benchmark the work using streamed social media content (Twitter) and show the advantages in performance and flexibility of RT-DBSCAN over other clustering approaches.
引用
收藏
页码:524 / 539
页数:16
相关论文
共 19 条
[1]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[2]   ST-DBSCAN: An algorithm for clustering spatial-temp oral data [J].
Birant, Derya ;
Kut, Alp .
DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) :208-221
[3]   Hybrid Clustering Algorithm [J].
Chandra, B. .
2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, :1345-1348
[4]  
Chen YX, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P133
[5]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[6]   Spatio-temporal data types: An approach to modeling and querying moving objects in databases [J].
Erwig M. ;
Güting R.H. ;
Schneider M. ;
Vazirgiannis M. .
GeoInformatica, 1999, 3 (3) :269-296
[7]  
Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226
[8]  
Ester M., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P323
[9]  
Gong YK, 2017, INT CONF BIG DATA, P13, DOI 10.1109/BIGCOMP.2017.7881699
[10]  
Hagedorn S., 2017, DATENBANKSYSTEME BUS