Distributed top-k aggregation queries at large

被引:12
|
作者
Neumann, Thomas [1 ]
Bender, Matthias [1 ]
Michel, Sebastian [2 ]
Schenkel, Ralf [1 ,3 ]
Triantafillou, Peter [4 ]
Weikum, Gerhard [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[3] Univ Saarland, D-6600 Saarbrucken, Germany
[4] Univ Patras, Patras, Greece
关键词
Top-k; Distributed queries; Query optimization; Cost models; SELECTION QUERIES;
D O I
10.1007/s10619-009-7041-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.
引用
收藏
页码:3 / 27
页数:25
相关论文
共 50 条
  • [41] Subscription-based data aggregation techniques for top-k monitoring queries
    Udomlamlert, Kamalas
    Hara, Takahiro
    Nishio, Shojiro
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (02): : 237 - 265
  • [42] Top-k Distance Queries on Large Time-Evolving Graphs
    D'Ascenzo, Andrea
    D'Emidio, Mattia
    IEEE ACCESS, 2023, 11 : 102228 - 102242
  • [43] Top-k Queries for Categorized RFID Systems
    Liu, Xiulong
    Li, Keqiu
    Guo, Song
    Liu, Alex X.
    Li, Peng
    Wang, Kun
    Wu, Jie
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2017, 25 (05) : 2587 - 2600
  • [44] Top-k Differential Queries in Graph Databases
    Vasilyeva, Elena
    Thiele, Maik
    Bornhoevd, Christof
    Lehner, Wolfgang
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS (ADBIS 2014), 2014, 8716 : 112 - 125
  • [45] Processing top-k queries from samples
    Cohen, Edith
    Grossaug, Nadav
    Kaplan, Haim
    COMPUTER NETWORKS, 2008, 52 (14) : 2605 - 2622
  • [46] Top-k Dominating Queries on Incomplete Data
    Miao, Xiaoye
    Gao, Yunjun
    Zheng, Baihua
    Chen, Gang
    Cui, Huiyong
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1500 - 1501
  • [47] CrowdK: Answering top-k queries with crowdsourcing
    Lee, Jongwuk
    Lee, Dongwon
    Hwang, Seung-won
    INFORMATION SCIENCES, 2017, 399 : 98 - 120
  • [48] Top-k differential queries in graph databases
    Vasilyeva, Elena
    Thiele, Maik
    Bornhövd, Christof
    Lehner, Wolfgang
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8716 : 112 - 115
  • [49] Approximate top-k queries in sensor networks
    Patt-Shamir, Boaz
    Shafrir, Allon
    STRUCTURAL INFORMATION AND COMMUNICATION COMPLEXITY, PROCEEDINGS, 2006, 4056 : 319 - +
  • [50] Reverse spatial top-k keyword queries
    Ahmed, Pritom
    Eldawy, Ahmed
    Hristidis, Vagelis
    Tsotras, Vassilis J.
    VLDB JOURNAL, 2023, 32 (03): : 501 - 524