Lower Bounds for Randomized Read/Write Stream Algorithms

被引:15
作者
Beame, Paul [1 ]
Jayram, T. S.
Rudra, Atri [1 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
来源
STOC 07: PROCEEDINGS OF THE 39TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING | 2007年
关键词
data stream algorithms; communication complexity;
D O I
10.1145/1250790.1250891
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Motivated by the capabilities of modern storage architectures, we consider the following generalization of the data stream model where the algorithm has sequential access to multiple streams. Unlike the data stream model, where the stream is read only, in this new model (introduced in [8, 9]) the algorithms can also write onto streams. There is no limit on the size of the streams but the number of passes made on the streams is restricted. On the other hand, the amount of internal memory used by the algorithm is scarce, similar to data stream model. We resolve the main open problem in [7] of proving lower bounds in this model for algorithms that are allowed to have 2-sided error. Previously, such lower bounds were Shown only for deterministic and I-sided error randomized algorithms [9, 7]. We consider the classical set disjointness problem that has proved to be invaluable for deriving lower bounds for many other problems involving data streams and other randomized models of computation. For this problem, we show a near-linear lower bound on the size of the internal memory used by a randomized algorithm with 2-sided error that is allowed to have o(log N/log log N) passes over the streams. This bound is almost optimal since there is a simple algorithm that can solve this problem using logarithmic memory if the number of passes over the streams is allowed to be O(log N). Applications include near-linear lower bounds on the internal memory for well-known problems in the literature: (1) approximately counting the number of distinct elements in the input (F(0)); (2) approximating the frequency of the mode of an input sequence (F(infinity)*); (3) computing the join of two relations; and (4) deciding if some node of an XML document matches an XQuery (or XPath) query. Our techniques involve a novel direct-sum type of argument that yields lower bounds for many other problems. Our results asymptotically improve previously known bounds for any problem even in deterministic and I-sided error models of computation.
引用
收藏
页码:689 / 698
页数:10
相关论文
共 15 条
[1]   On the streaming model augmented with a sorting primitive [J].
Aggarwal, G ;
Datar, M ;
Rajagopalan, S ;
Ruhl, M .
45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2004, :540-549
[2]  
Babai L., 1986, 27th Annual Symposium on Foundations of Computer Science (Cat. No.86CH2354-9), P337, DOI 10.1109/SFCS.1986.15
[3]  
Babcock B., 2002, Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), P1, DOI DOI 10.1145/543613.543615
[4]   Time-space trade-off lower bounds for randomized computation of decision problems [J].
Beame, P ;
Saks, M ;
Sun, XD ;
Vee, E .
JOURNAL OF THE ACM, 2003, 50 (02) :154-195
[5]   REVERSAL COMPLEXITY [J].
CHEN, JE ;
YAP, CK .
SIAM JOURNAL ON COMPUTING, 1991, 20 (04) :622-638
[6]   UNBIASED BITS FROM SOURCES OF WEAK RANDOMNESS AND PROBABILISTIC COMMUNICATION COMPLEXITY [J].
CHOR, B ;
GOLDREICH, O .
SIAM JOURNAL ON COMPUTING, 1988, 17 (02) :230-261
[7]  
Grohe M, 2005, LECT NOTES COMPUT SC, V3580, P1076
[8]  
GROHE M, 2006, P PODS 06, P243
[9]  
GROHE M, 2005, P PODS 05, P238
[10]   Tight lower bounds for the distinct elements problem [J].
Indyk, P ;
Woodruff, D .
44TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2003, :283-288