Pruning techniques for parallel processing of reverse top-k queries

被引:0
作者
Panagiotis Nikitopoulos
Georgios A. Sfyris
Akrivi Vlachou
Christos Doulkeridis
Orestis Telelis
机构
[1] University of Piraeus,Department of Digital Systems, School of Information and Communication Technologies
来源
Distributed and Parallel Databases | 2021年 / 39卷
关键词
Reverse top-k query; Parallel processing; Distributed data; Pruning techniques;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we address the problem of processing reverse top-k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q, the reverse top-k query returns the subset of user preferences for which the query object belongs to the top-k results. Although recently the reverse top-k query operator has been studied extensively, its CPU-intensive nature results in prohibitively expensive processing cost, when applied on vast-sized data sets. This limitation motivates us to explore a scalable parallel processing solution, in order to enable reverse top-k processing over distributed large sets of input data in reasonable execution time. We present an algorithmic framework for the problem, in which different algorithms can be instantiated, targeting a generic parallel setting. We describe a parallel algorithm (DiPaRT) that exploits basic pruning properties and is provably correct, as an instantiation of the framework. Furthermore, we introduce novel pruning properties for the problem, and propose DiPaRT+ as another instance of the algorithmic framework, which offers improved efficiency and scales gracefully. All algorithms are implemented in MapReduce, and we provide a wide set of experiments that demonstrate the improved efficiency of DiPaRT+ using data sets that are four orders of magnitude larger than those handled by centralized approaches.
引用
收藏
页码:169 / 199
页数:30
相关论文
共 57 条
  • [1] Abouzeid A(2009)HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads PVLDB 2 922-933
  • [2] Bajda-Pawlikowski K(2012)The HaLoop approach to large-scale iterative data analysis VLDB J. 21 169-190
  • [3] Abadi DJ(2011)RanKloud: scalable multimedia data processing in server clusters IEEE MultiMedia 18 64-77
  • [4] Rasin A(2008)Mapreduce: simplified data processing on large clusters Commun. ACM 51 107-113
  • [5] Silberschatz A(2014)A survey of large-scale analytical query processing in mapreduce VLDB J. 23 355-380
  • [6] Bu Y(2015)Answering why-not questions on reverse top-k queries PVLDB 8 738-749
  • [7] Howe B(2013)Efficient all top-k computation: a unified solution for all top-k, reverse top-k and top-m influential queries IEEE TKDE 25 1015-1027
  • [8] Balazinska M(2017)User-centric similarity search IEEE Trans. Knowl. Data Eng. 29 200-213
  • [9] Ernst MD(2008)A survey of top-k query processing techniques in relational database systems ACM Comput. Surv. 40 1-58
  • [10] Candan KS(2013)Flexible and extensible preference evaluation in database systems ACM Trans. Database Syst. 38 17:1-17:43