Approximation trade-offs in a Markovian stream warehouse: An empirical study

被引:3
作者
Letchner, J. [1 ]
Balazinska, M. [2 ]
Re, C. [3 ]
Philipose, M. [4 ]
机构
[1] Microsoft Corp, Bellevue, WA USA
[2] Univ Washington, Seattle, WA 98195 USA
[3] Univ Wisconsin, Madison, WI USA
[4] Intel Res, Seattle, WA USA
关键词
Streams; Imprecision; Uncertainty; Approximation; Compression; UNCERTAIN;
D O I
10.1016/j.is.2012.04.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A large amount of the world's data is both sequential and low-level. Many applications need to query higher-level information (e.g., words and sentences) that is inferred from these low-level sequences (e.g., raw audio signals) using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level sequences that are imprecise. Once archived, these imprecise streams are difficult to query efficiently because of their rich semantics and large volumes, forcing applications to sacrifice either performance or accuracy. There exists little work, however, that characterizes this trade-off space and helps applications make an appropriate choice. In this paper, we study the effects - on both efficiency and accuracy - of various stream approximations such as ignoring correlations, ignoring low-probability states, or retaining only the single most likely sequence of events. Through experiments on a real-world RFID data set, we identify conditions under which various approximations can improve performance by several orders of magnitude, with only minimal effects on query results. We also identify cases when the full rich semantics are necessary. This study is the first to evaluate the cost vs. quality trade-off of imprecise stream models. We perform this study using Lahar, a prototype Markovian stream warehouse. A secondary contribution of this paper is the development of query semantics and algorithms for processing aggregation queries on the output of pattern queries-we develop these queries in order to more fully understand the effects of approximation on a wider set of imprecise stream queries. (C) 2013 Published by Elsevier Ltd.
引用
收藏
页码:290 / 304
页数:15
相关论文
共 40 条
[1]  
Agrawai P., 2006, VLDB
[2]  
Agrawal J., 2008, SIGMOD 08, P147
[3]  
[Anonymous], 2004, RFID Journal
[4]  
[Anonymous], 1999, Learning in Graphical Models
[5]  
Antova L., 2007, ICDE
[6]  
Apaydin Tan., 2006, VLDB conference, P846
[7]   A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking [J].
Arulampalam, MS ;
Maskell, S ;
Gordon, N ;
Clapp, T .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2002, 50 (02) :174-188
[8]   OLAP over uncertain and imprecise data [J].
Burdick, Doug ;
Deshpande, Prasad M. ;
Jayram, T. S. ;
Ramakrishnan, Raghu ;
Vaithyanathan, Shivakumar .
VLDB JOURNAL, 2007, 16 (01) :123-144
[9]  
Cowell Robert G., 1999, Probabilistic networks and expert systems
[10]  
Dalvi Nilesh, 2004, VLDB, V30, P864