Continuous Monitoring of Distributed Data Streams over a Time-Based Sliding Window

被引:2
作者
Ho-Leung Chan
Tak-Wah Lam
Lap-Kei Lee
Hing-Fung Ting
机构
[1] University of Hong Kong,Department of Computer Science
[2] Max-Planck-Institut für Informatik,undefined
来源
Algorithmica | 2012年 / 62卷
关键词
Algorithms; Distributed data streams; Communication; Frequent items; Quantiles;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we extend the study of algorithms for monitoring distributed data streams from whole data streams to a time-based sliding window. The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to report the global statistics of all streams within a given error bound. This paper presents communication-efficient algorithms for three classical statistics, namely, basic counting, frequent items and quantiles. The worst-case communication cost over a window is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(\frac{k}{\varepsilon} \log\frac{\varepsilon N}{k})$\end{document} bits for basic counting, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(\frac{k}{\varepsilon} \log\frac{N}{k})$\end{document} words for frequent items and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(\frac{k}{\varepsilon^{2}} \log\frac{N}{k})$\end{document} words for quantiles, where k is the number of distributed data streams, N is the total number of items in the streams that arrive or expire in the window, and ε<1 is the given error bound. The performance of our algorithms matches and nearly matches the corresponding lower bounds. We also show how to generalize these results to streams with out-of-order data.
引用
收藏
页码:1088 / 1111
页数:23
相关论文
共 11 条
[1]  
Alon N.(1999)The space complexity of approximating the frequency moments J. Comput. Syst. Sci. 58 137-147
[2]  
Matias Y.(2004)Communication complexity of simultaneous messages SIAM J. Comput. 33 137-166
[3]  
Szegedy M.(2002)Maintaining stream statistics over sliding windows SIAM J. Comput. 31 1794-1813
[4]  
Babai L.(undefined)undefined undefined undefined undefined-undefined
[5]  
Gal A.(undefined)undefined undefined undefined undefined-undefined
[6]  
Kimmel P.(undefined)undefined undefined undefined undefined-undefined
[7]  
Lokam S.(undefined)undefined undefined undefined undefined-undefined
[8]  
Datar M.(undefined)undefined undefined undefined undefined-undefined
[9]  
Gionis A.(undefined)undefined undefined undefined undefined-undefined
[10]  
Indyk P.(undefined)undefined undefined undefined undefined-undefined