DGStream: High quality and efficiency stream clustering algorithm

被引:14
作者
Ahmed, Rowanda [1 ]
Dalkilic, Gokhan [2 ]
Erten, Yusuf [1 ,3 ]
机构
[1] Izmir Inst Technol, Comp Engn Dept, TR-35433 Izmir, Turkey
[2] Dokuz Eylul Univ, Comp Engn Dept, TR-35390 Izmir, Turkey
[3] Bakircay Univ, TR-35665 Izmir, Turkey
关键词
Data streams architectures; Data stream mining; Grid-based clustering; Density-based clustering; Online clustering;
D O I
10.1016/j.eswa.2019.112947
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently as applications produce overwhelming data streams, the need for strategies to analyze and cluster streaming data becomes an urgent and a crucial research area for knowledge discovery. The main objective and the key aim of data stream clustering is to gain insights into incoming data. Recognizing all probable patterns in this boundless data which arrives at varying speeds and structure and evolves over time, is very important in this analysis process. The existing data stream clustering strategies so far, all suffer from different limitations, like the inability to find the arbitrary shaped clusters and handling outliers in addition to requiring some parameter information for data processing. For fast, accurate, efficient and effective handling for all these challenges, we proposed DGStream, a new online-offline grid and density-based stream clustering algorithm. We conducted many experiments and evaluated the performance of DGStream over different simulated databases and for different parameter settings where a wide variety of concept drifts, novelty, evolving data, number and size of clusters and outlier detection are considered. Our algorithm is suitable for applications where the interest lies in the most recent information like stock market, or if the analysis of existing information is required as well as cases where both the old and the recent information are all equally important. The experiments, over the synthetic and real datasets, show that our proposed algorithm outperforms the other algorithms in efficiency. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 49 条
[1]  
Aggarwal C. C., 2003, P 2003 VLDB C, V29, P81
[2]  
Agrawal R., 1998, SIGMOD Record, V27, P94, DOI 10.1145/276305.276314
[3]  
Ahmed R.D., 2018, CEUR WORKSHOP P
[4]   EINCKM: An Enhanced Prototype-based Method for Clustering Evolving Data Streams in Big Data [J].
Al Abd Alazeez, Ammar ;
Jassim, Sabah ;
Du, Hongbo .
ICPRAM: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2017, :173-183
[5]  
Alhanjouri M.A., 2012, INT J ADV RES COMPUT, V3, P1
[6]   A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream [J].
Amini, Amineh ;
Saboohi, Hadi ;
Teh, Ying Wah ;
Herawan, Tutut .
SCIENTIFIC WORLD JOURNAL, 2014,
[7]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[8]  
[Anonymous], 2005, P MACHINE LEARNING R
[9]  
[Anonymous], 2003, P 22 ACM SICACT SICM, DOI DOI 10.1145/773153.773176
[10]  
[Anonymous], 2002, P 18 C UNC ART INT