An efficient parallel processing method for skyline queries in MapReduce

被引：0

作者：

Junsu Kim

Myoung Ho Kim

机构：

[1] KAIST,School of Computing

来源：

The Journal of Supercomputing | 2018年 / 74卷

关键词：

Skyline query processing; Parallel processing; Distributed processing; MapReduce; Distributed systems; Big data;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Skyline queries are useful for finding only interesting tuples from multi-dimensional datasets for multi-criteria decision making. To improve the performance of skyline query processing for large-scale data, it is necessary to use parallel and distributed frameworks such as MapReduce that has been widely used recently. There are several approaches which process skyline queries on a MapReduce framework to improve the performance of query processing. Some methods process a part of the skyline computation in a serial manner, while there are other methods that process all parts of the skyline computation in parallel. However, each of them suffers from at least one of two drawbacks: (1) the serial computations may prevent them from fully utilizing the parallelism of the MapReduce framework; (2) when processing the skyline queries in a parallel and distributed manner, the additional overhead for the parallel processing may outweigh the benefit gained from parallelization. In order to efficiently process skyline queries for large data in parallel, we propose a novel two-phase approach in MapReduce framework. In the first phase, we start by dividing the input dataset into a number of subsets (called cells) and then we compute local skylines only for the qualified cells. The outer-cell filter used in this phase considerably improves the performance by eliminating a large number of tuples in unqualified cells. In the second phase, the global skyline is computed from local skylines. To separately determine global skyline tuples from each local skyline in parallel, we design the inner-cell filter and also propose efficient methods to reduce the overhead caused by computing and utilizing the inner-cell filters. The primary advantage of our approach is that it processes skyline queries fast and in a fully parallelized manner in all states of the MapReduce framework with the two filtering techniques. Throughout extensive experiments, we demonstrate that the proposed approach substantially increases the overall performance of skyline queries in comparison with the state-of-the-art skyline processing methods. Especially, the proposed method achieves remarkably good performance and scalability with regard to the dataset size and the dimensionality. Our approach has significant benefits for large-scale query processing of skylines in distributed and parallel computing environments.

引用

页码：886 / 935

页数：49

共 50 条

[21] An Efficient MapReduce-Based Parallel Processing Framework for User-Based Collaborative Filtering
Jeong, Hanjo
Cha, Kyung Jin
SYMMETRY-BASEL, 2019, 11 (06):
[22] RHJoin: A Fast and Space-efficient Join Method for Log Processing in MapReduce
Tang, Dixin
Liu, Taoying
Liu, Hong
Li, Wei
2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 975 - 980
[23] THE FX DISTRIBUTION METHOD FOR PARALLEL PROCESSING OF PARTIAL MATCH QUERIES
KIM, MH
PRAMANIK, S
INFORMATION PROCESSING LETTERS, 1991, 38 (05) : 243 - 252
[24] Efficient k-dominant skyline query over incomplete data using MapReduce
Linlin Ding
Shu Wang
Baoyan Song
Frontiers of Computer Science, 2021, 15
[25] Efficient k-dominant skyline query over incomplete data using MapReduce
Ding, Linlin
Wang, Shu
Song, Baoyan
FRONTIERS OF COMPUTER SCIENCE, 2021, 15 (04)
[26] Comparison of the Efficiency of MapReduce and Bulk Synchronous Parallel Approaches to Large Network Processing
Kajdanowicz, Tomasz
Indyk, Wojciech
Kazienko, Przemyslaw
Kukul, Jakub
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 218 - 225
[27] Parallel Data Processing with MapReduce: A Survey
Lee, Kyong-Ha
Lee, Yoon-Joon
Choi, Hyunsik
Chung, Yon Dohn
Moon, Bongki
SIGMOD RECORD, 2011, 40 (04) : 11 - 20
[28] MapReduce++ - Efficient processing of MapReduce jobs in the cloud
Zhang, Guigang
Li, Chao
Zhang, Yong
Xing, Chunxiao
Yang, Jijiang
Journal of Computational Information Systems, 2012, 8 (14): : 5757 - 5764
[29] Efficient skyline query processing in SpatialHadoop
Pertesis, Dimitris
Doulkeridis, Christos
INFORMATION SYSTEMS, 2015, 54 : 325 - 335
[30] Efficient subspace skyline query based on user preference using MapReduce
Li, Yuanyuan
Li, Zhiyang
Dong, Mianxiong
Qu, Wenyu
Ji, Changqing
Wu, Junfeng
AD HOC NETWORKS, 2015, 35 : 105 - 115

← 1 2 3 4 5 →