Pruning techniques for parallel processing of reverse top-k queries

被引：0

作者：

Panagiotis Nikitopoulos

Georgios A. Sfyris

Akrivi Vlachou

Christos Doulkeridis

Orestis Telelis

机构：

[1] University of Piraeus,Department of Digital Systems, School of Information and Communication Technologies

来源：

Distributed and Parallel Databases | 2021年 / 39卷

关键词：

Reverse top-k query; Parallel processing; Distributed data; Pruning techniques;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper, we address the problem of processing reverse top-k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q, the reverse top-k query returns the subset of user preferences for which the query object belongs to the top-k results. Although recently the reverse top-k query operator has been studied extensively, its CPU-intensive nature results in prohibitively expensive processing cost, when applied on vast-sized data sets. This limitation motivates us to explore a scalable parallel processing solution, in order to enable reverse top-k processing over distributed large sets of input data in reasonable execution time. We present an algorithmic framework for the problem, in which different algorithms can be instantiated, targeting a generic parallel setting. We describe a parallel algorithm (DiPaRT) that exploits basic pruning properties and is provably correct, as an instantiation of the framework. Furthermore, we introduce novel pruning properties for the problem, and propose DiPaRT+ as another instance of the algorithmic framework, which offers improved efficiency and scales gracefully. All algorithms are implemented in MapReduce, and we provide a wide set of experiments that demonstrate the improved efficiency of DiPaRT+ using data sets that are four orders of magnitude larger than those handled by centralized approaches.

引用

页码：169 / 199

页数：30

共 57 条

[1] Abouzeid A(2009)HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads PVLDB 2 922-933
[2] Bajda-Pawlikowski K(2012)The HaLoop approach to large-scale iterative data analysis VLDB J. 21 169-190
[3] Abadi DJ(2011)RanKloud: scalable multimedia data processing in server clusters IEEE MultiMedia 18 64-77
[4] Rasin A(2008)Mapreduce: simplified data processing on large clusters Commun. ACM 51 107-113
[5] Silberschatz A(2014)A survey of large-scale analytical query processing in mapreduce VLDB J. 23 355-380
[6] Bu Y(2015)Answering why-not questions on reverse top-k queries PVLDB 8 738-749
[7] Howe B(2013)Efficient all top-k computation: a unified solution for all top-k, reverse top-k and top-m influential queries IEEE TKDE 25 1015-1027
[8] Balazinska M(2017)User-centric similarity search IEEE Trans. Knowl. Data Eng. 29 200-213
[9] Ernst MD(2008)A survey of top-k query processing techniques in relational database systems ACM Comput. Surv. 40 1-58
[10] Candan KS(2013)Flexible and extensible preference evaluation in database systems ACM Trans. Database Syst. 38 17:1-17:43

← 1 2 3 4 5 6 →