Distributed top-k aggregation queries at large

被引:12
|
作者
Neumann, Thomas [1 ]
Bender, Matthias [1 ]
Michel, Sebastian [2 ]
Schenkel, Ralf [1 ,3 ]
Triantafillou, Peter [4 ]
Weikum, Gerhard [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[3] Univ Saarland, D-6600 Saarbrucken, Germany
[4] Univ Patras, Patras, Greece
关键词
Top-k; Distributed queries; Query optimization; Cost models; SELECTION QUERIES;
D O I
10.1007/s10619-009-7041-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.
引用
收藏
页码:3 / 27
页数:25
相关论文
共 50 条
  • [21] Space Filling Approach for Distributed Processing of Top-k Dominating Queries
    Amagata, Daichi
    Hara, Takahiro
    Onizuka, Makoto
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (06) : 1150 - 1163
  • [22] A Distributed Approach for Top-k Star Queries on Massive Information Networks
    Jin, Jiahui
    Khemmarat, Samamon
    Gao, Lixin
    Luo, Junzhou
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 9 - 16
  • [23] Distributed probabilistic top-k dominating queries over uncertain databases
    Niranjan Rai
    Xiang Lian
    Knowledge and Information Systems, 2023, 65 : 4939 - 4965
  • [24] Distributed Processing of Probabilistic Top-k Queries in Wireless Sensor Networks
    Ye, Mao
    Lee, Wang-Chien
    Lee, Dik Lun
    Liu, Xingjie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 76 - 91
  • [25] Distributed probabilistic top-k dominating queries over uncertain databases
    Rai, Niranjan
    Lian, Xiang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (11) : 4939 - 4965
  • [26] Processing Top-k Join Queries
    Wu, Minji
    Berti-Equille, Laure
    Marian, Amelie
    Procopiuc, Cecilia M.
    Srivastava, Divesh
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 860 - 870
  • [27] Top-k Combinatorial Skyline Queries
    Su, I-Fang
    Chung, Yu-Chi
    Lee, Chiang
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT II, PROCEEDINGS, 2010, 5982 : 79 - +
  • [28] Evaluating top-k selection queries
    Chaudhuri, S
    Gravano, L
    PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999, : 399 - 410
  • [29] Top-k spatial preference queries
    Yiu, Man Lung
    Dai, Xiangyuan
    Mamoulis, Nikos
    Vaitis, Michail
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 1051 - +
  • [30] Top-k queries on temporal data
    Li, Feifei
    Yi, Ke
    Le, Wangchao
    VLDB JOURNAL, 2010, 19 (05): : 715 - 733