Effective and efficient top-k query processing over incomplete data streams

被引:12
作者
Ren, Weilong [1 ]
Lian, Xiang [1 ]
Ghazinour, Kambiz [1 ,2 ]
机构
[1] Kent State Univ, Dept Comp Sci, Kent, OH 44242 USA
[2] State Univ New York, Ctr Criminal Justice Intelligence & Cybersecur, Canton, NY 13617 USA
关键词
Top-k query; Incomplete data streams; Top-k-iDS;
D O I
10.1016/j.ins.2020.08.011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, efficient and effective stream processing has become increasingly important in many real-world applications such as sensor data monitoring, network intrusion detection, IP network traffic analysis, and so on. In practice, stream data often encounter the problem of having some data attributes missing, due to reasons such as packet losses, network congestion/failure, and so on. In such a scenario, it is rather important, yet challenging, to accurately and efficiently monitor top-k objects over incomplete data stream, which may potentially indicate some dangerous and critical security events (e.g., fire, network intrusion, or denial-of-service attack). In this paper, we formally define the problem of top-k query over incomplete data stream (Topk-iDS), which continuously detects top-k objects with the highest ranking scores over an incomplete data stream. Due to unique characteristics such as incompleteness and stream processing, we propose a cost-model-based data imputation approach, design effective pruning strategies to reduce the Topk-iDS search space, and carefully devise dynamically updated data synopses to facilitate Topk-iDS query processing. We also propose an efficient algorithm to perform the data imputation and incremental Topk-iDS computation at the same time. Finally, through extensive experiments, we evaluate the efficiency and effectiveness of our proposed Topk-iDS query answering approach over both real and synthetic data sets.. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:343 / 371
页数:29
相关论文
共 48 条
[1]  
Abadi D., 2004, Proceedings of the 30th International Conference on Very Large Data Bases Endowment, V30, P1361
[2]   On the Estimation and Control of Nonlinear Systems With Parametric Uncertainties and Noisy Outputs [J].
Alberto Meda-Campana, Jesus .
IEEE ACCESS, 2018, 6 :31968-31973
[3]   Efficient approximation of correlated sums on data streams [J].
Ananthakrishna, R ;
Das, A ;
Gehrke, J ;
Korn, F ;
Muthukrishnan, S ;
Srivastava, D .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (03) :569-572
[4]  
[Anonymous], 2008, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD '08
[5]  
[Anonymous], 2004, Proc. Int'l Conf. Very Large Data Bases
[6]   Novel Nonlinear Hypothesis for the Delta Parallel Robot Modeling [J].
Aquino, Gustavo ;
Rubio, Jose De Jesus ;
Pacheco, Jaime ;
Gutierrez, Guadalupe Juliana ;
Ochoa, Genaro ;
Balcazar, Ricardo ;
Cruz, David Ricardo ;
Garcia, Enrique ;
Novoa, Juan Francisco ;
Zacarias, Alejandro .
IEEE ACCESS, 2020, 8 :46324-46334
[7]  
Babcock B., 2002, Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, P1, DOI DOI 10.1145/543613.543615
[8]  
BECKMANN N, 1990, SIGMOD REC, V19, P322, DOI 10.1145/93605.98741
[9]   Self-spatial join selectivity estimation using fractal concepts [J].
Belussi, A ;
Faloutsos, C .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1998, 16 (02) :161-201
[10]  
Berchtold S, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P28