Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks

被引:0
作者
Zengfeng Huang
Ke Yi
Qin Zhang
机构
[1] Fudan University,School of Data Science
[2] The Hong Kong University of Science and Technology,undefined
[3] Indiana University Bloomington,undefined
来源
Algorithmica | 2019年 / 81卷
关键词
Continuous distributed tracking; Randomized algorithms; Distributed streaming;
D O I
暂无
中图分类号
学科分类号
摘要
We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the count-tracking problem, where there are k players, each holding a counter ni\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_i$$\end{document} that gets incremented over time, and the goal is to track an ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-approximation of their sum n=∑ini\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=\sum _i n_i$$\end{document} continuously at all times, using minimum communication. While the deterministic communication complexity of the problem is Θ(k/ε·logN)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varTheta }(k/\varepsilon \cdot \log N)$$\end{document}, where N is the final value of n when the tracking finishes, we show that with randomization, the communication cost can be reduced to Θ(k/ε·logN)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varTheta }(\sqrt{k}/\varepsilon \cdot \log N)$$\end{document}. Our algorithm is simple and uses only O(1) space at each player, while the lower bound holds even assuming each player has infinite computing power. Then, we extend our techniques to two related distributed tracking problems: frequency-tracking and rank-tracking, and obtain similar improvements over previous deterministic algorithms. Both problems are of central importance in large data monitoring and analysis, and have been extensively studied in the literature.
引用
收藏
页码:2222 / 2243
页数:21
相关论文
共 26 条
  • [1] Chan H-L(2011)Continuous monitoring of distributed data streams over a time-based sliding window Algorithmica 62 1088-1111
  • [2] Lam TW(2013)The continuous distributed monitoring model ACM SIGMOD Rec. 42 5-14
  • [3] Lee L-K(2011)Algorithms for distributed functional monitoring ACM Trans. Algorithms 7 Article 21-1133
  • [4] Ting H-F(2012)Continuous sampling from distributed streams J. ACM 59 10-152
  • [5] Cormode G(2006)An integrated efficient solution for computing frequent and top-k elements in data streams ACM Trans. Database Syst. 31 1095-323
  • [6] Cormode G(1982)Finding repeated elements Sci. Comput. Program. 2 143-22
  • [7] Muthukrishnan S(1980)Selection and sorting with limited storage Theor. Comput. Sci. 12 315-655
  • [8] Yi K(2008)Approximate distributed top-k queries Distrib. Comput. 21 1-280
  • [9] Cormode G(2006)Range counting over multidimensional data streams Discrete Comput. Geom. 36 633-undefined
  • [10] Muthukrishnan S(1971)On the uniform convergence of relative frequencies of events to their probabilities Theory Probab. Appl. 16 264-undefined