Optimal algorithms for selecting top-k combinations of attributes: theory and applications

被引：6

作者：

Lin, Chunbin ^{[1
]}

Lu, Jiaheng ^{[2
]}

Wei, Zhewei ^{[3
]}

Wang, Jianguo ^{[1
]}

Xiao, Xiaokui ^{[4
]}

机构：

[1] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA

[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland

[3] Renmin Univ China, Sch Informat, Beijing, Peoples R China

[4] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

来源：

VLDB JOURNAL | 2018年 / 27卷 / 01期

基金：

芬兰科学院;

关键词：

Top-k query; Top-k m query; Instance optimal algorithm; KEYWORD SEARCH; RELATIONAL DATABASES; QUERIES;

D O I：

10.1007/s00778-017-0485-2

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.

引用

页码：27 / 52

页数：26

共 46 条

[1] [Anonymous], 2016, IJCAI C
[2] [Anonymous], 2007, P IEEE INT C DAT ENG
[3] Babcock B, 2002, SIAM PROC S, P633
[4] Bast H., 2006, P 32 INT C VERY LARG, P475
[5] Evaluating Top-k queries over web-accessible Databases
Bruno, N
Gravano, L
Marian, A
[J]. 18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 369 - +
[6] Top-k selection queries over relational databases:: Mapping strategies and performance evaluation
Bruno, N
Chaudhuri, S
Gravano, L
[J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2002, 27 (02): : 153 - 187
[7] Chang KevinChen-Chuan., 2002, P ACM INT C MANAGEME, P346
[8] Supporting Top-K Keyword Search in XML Databases
Chen, Liang Jeff
Papakonstantinou, Yannis
[J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 689 - 700
[9] Dylla M, 2013, PROC INT CONF DATA, P122, DOI 10.1109/ICDE.2013.6544819
[10] Fagin R., 1996, Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 1996, P216, DOI 10.1145/237661.237715

← 1 2 3 4 5 →