Dynamic adaptive data structures for monitoring data streams

被引:3
|
作者
Aguilar-Saborit, J. [1 ]
Trancoso, P. [2 ]
Muntes-Muleroc, V. [3 ]
Larriba-Pey, J. L. [3 ]
机构
[1] IBM Toronto Lab, Markham, ON L6G 1C7, Canada
[2] Univ Cyprus, Dept Comp Sci, Nicosia, Cyprus
[3] Univ Politecn Cataluna, Comp Architecture Dept, DAMA UPC, E-08028 Barcelona, Spain
关键词
data streams; data structures; bloom filters;
D O I
10.1016/j.datak.2007.12.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The monitoring of data streams is a very important issue in many different areas. Aspects such as accuracy, the speed of response, the use of memory and the adaptability to the changing nature of data may vary in importance depending on the situation. Examples such as Web page access monitoring, approximate aggregation in relational queries or IP message routing are clear examples of a varied range of those needs. There are different data structures that deal with this problem such as the counting bloom filters, the spectral bloom filters and the dynamic count filters. Those data structures range from static to complex dynamic representations of the data stream that keep an approximate count of the number of occurrences for each data value. In this paper, we focus on three main aspects. First, we analyze the problem in perspective and review the existing static and dynamic solutions. Second, we propose and analyze in depth a simple yet powerful partitioning strategy that reinforces the advantages of the methods proposed up to now solving most of their drawbacks. Finally, using real executions and mathematical models, we evaluate the existing methods alone and in combination with our partitioning strategy. We show that with our partitioning strategy, it is possible to reduce the memory requirements and average response time, improving the adaptiveness to changing data characteristics and leaving the accuracy of the partitioned dynamic data structures intact. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:92 / 115
页数:24
相关论文
共 50 条
  • [1] An Adaptive Framework for Clustering Data Streams
    Chandrika
    Kumar, K. R. Ananda
    ADVANCES IN COMPUTING AND COMMUNICATIONS, PT I, 2011, 190 : 704 - +
  • [2] Adaptive Regression Analysis of Heterogeneous Data Streams via Models with Dynamic Effects
    Wei, Jianfeng
    Yang, Jian
    Cheng, Xuewen
    Ding, Jie
    Li, Shengquan
    MATHEMATICS, 2023, 11 (24)
  • [3] Thresholded Monitoring in Distributed Data Streams
    Li, Meng
    Dai, Haipeng
    Wang, Xiaoyu
    Xia, Rui
    Liu, Alex X.
    Chen, Guihai
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2020, 28 (03) : 1033 - 1046
  • [4] Dynamic Sketching over Distributed Data Streams
    Wu, Guangjun
    Jia, Siyu
    Li, Binbin
    Wang, Shupeng
    Bao, Xiuguo
    Yuan, Qingsheng
    2016 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2016,
  • [5] Incremental Learning Algorithm for Dynamic Data Streams
    Kuthadi, Venu Madhav
    Govardhan, A.
    Chand, P. Prem
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (09): : 338 - 345
  • [6] Ubiquitous Artificial Intelligence and Dynamic Data Streams
    Bifet, Albert
    Read, Jesse
    DEBS'18: PROCEEDINGS OF THE 12TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, 2018, : 1 - 6
  • [7] Temporally adaptive estimation of logistic classifiers on data streams
    Anagnostopoulos, Christoforos
    Tasoulis, Dimitris K.
    Adams, Niall M.
    Hand, David J.
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2009, 3 (03) : 243 - 261
  • [8] Temporally adaptive estimation of logistic classifiers on data streams
    Christoforos Anagnostopoulos
    Dimitris K. Tasoulis
    Niall M. Adams
    David J. Hand
    Advances in Data Analysis and Classification, 2009, 3 : 243 - 261
  • [9] Adaptive Random Forests with Resampling for Imbalanced data Streams
    Boiko Ferreira, Luis Eduardo
    Gomes, Heitor Murilo
    Bifet, Albert
    Oliveira, Luiz S.
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [10] Monitoring Data Streams at Process Level in Scientific Big Data Batch Clusters
    Kuehn, Eileen
    Fischer, Max
    Jung, Christopher
    Petzold, Andreas
    Streit, Achim
    2014 IEEE/ACM INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2014, : 90 - 95