A Storm-Based Parallel Clustering Algorithm of Streaming Data

被引:0
作者
Xu, Fang-Zhu [1 ,2 ]
Jiang, Zhi-Ying [1 ,2 ]
He, Yan-Lin [1 ,2 ]
Wang, Ya-Jie [3 ]
Zhu, Qun-Xiong [1 ,2 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China
[2] Minist Educ China, Engn Res Ctr Intelligent PSE, Beijing 100029, Peoples R China
[3] Guizhou Food Safety Testing Engn Technol Res Ctr, Guiyang, Guizhou, Peoples R China
来源
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV | 2018年 / 11304卷
关键词
Parallel clustering; Streaming data; Single-Pass algorithm;
D O I
10.1007/978-3-030-04212-7_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Aiming at solving the shortcomings of traditional Single-Pass clustering algorithms, such as low accuracy and large amount of computation, a novel Storm-based parallel Single-Pass clustering algorithm is proposed to discovery of hot events in the food field. In order to solve the problem of data inconsistency in parallel computing, a method of dynamically acquiring cluster increments and random delays is adopted to improve the Single-Pass algorithm. In order to validate the performance of the proposed method, a case study of news events classification is carried out. Simulation results show that the proposed algorithm can effectively improve the cluster repetition in clustering results and greatly improve the accuracy and efficiency of clustering compared with the traditional Single-Pass algorithm.
引用
收藏
页码:134 / 144
页数:11
相关论文
共 15 条
  • [1] 大数据系统和分析技术综述
    程学旗
    靳小龙
    王元卓
    郭嘉丰
    张铁赢
    李国杰
    [J]. 软件学报, 2014, 25 (09) : 1889 - 1908
  • [2] Dong X., 2013, COMPUTER RES DEV, V50, P1
  • [3] A single pass algorithm for clustering evolving data streams based on swarm intelligence
    Forestiero, Agostino
    Pizzuti, Clara
    Spezzano, Giandomenico
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (01) : 1 - 26
  • [4] Gu H, 2013, CHINA ACAD J ELECT P, V5, P5180
  • [5] Clustering Big Data streams: recent challenges and contributions
    Hassani, Marwan
    Seidl, Thomas
    [J]. IT-INFORMATION TECHNOLOGY, 2016, 58 (04): : 206 - 213
  • [6] Hengmin Z, 2011, DATA ANAL KNOWL DISC, V27, P52
  • [7] Hyde R, 2016, INF SCI, V382, P96
  • [8] Distributed stream clustering using micro-clusters on Apache Storm
    Karunaratne, Pasan
    Karunasekera, Shanika
    Harwood, Aaron
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2017, 108 : 74 - 84
  • [9] Qinglin Guo, 2008, 2008 IEEE 32nd International Computer Software and Applications Conference (COMPSAC), P585, DOI 10.1109/COMPSAC.2008.196
  • [10] Tu Shouzhong, 2016, Journal of China Universities of Posts and Telecommunications, V23, P40, DOI 10.1016/S1005-8885(16)60056-0