Top-k typicality queries and efficient query answering methods on large databases

被引：17

作者：

Hua, Ming ^{[1
]}

Pei, Jian ^{[1
]}

Fu, Ada W. C. ^{[2
]}

Lin, Xuemin ^{[3
,4
]}

Leung, Ho-Fung ^{[2
]}

机构：

[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada

[2] Chinese Univ Hong Kong, Shatin, Hong Kong, Peoples R China

[3] Univ New S Wales, Sydney, NSW, Australia

[4] NICTA, Sydney, NSW, Australia

来源：

VLDB JOURNAL | 2009年 / 18卷 / 03期

基金：

加拿大自然科学与工程研究理事会; 澳大利亚研究理事会;

关键词：

Typicality analysis; Top-k query; Efficient query answering; CROSS-VALIDATION; DENSITY; SIMILARITY;

D O I：

10.1007/s00778-008-0128-8

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognitive science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. Three types of top-k typicality queries are formulated. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed. Moreover, to answer questions like "Who are the best k typical guards in whole representing different types of guards?", the notion of representative typicality is used. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations: (1) the randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers; (2) the direct local typicality approximation using VP-trees provides an approximation quality guarantee; (3) a local typicality tree data structure can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree. An extensive performance study using two real data sets and a series of synthetic data sets clearly shows that top-k typicality queries are meaningful and our methods are practical.

引用

页码：809 / 835

页数：27

共 50 条

[41] Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries
Zou, Lei
Chen, Lei
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (05) : 727 - 741
[42] Evaluating continuous top-k queries over document streams
Weixiong Rao
Lei Chen
Shudong Chen
Sasu Tarkoma
World Wide Web, 2014, 17 : 59 - 83
[43] Sliding-window top-k queries on uncertain streams
Jin, Cheqing
Yi, Ke
Chen, Lei
Yu, Jeffrey Xu
Lin, Xuemin
VLDB JOURNAL, 2010, 19 (03) : 411 - 435
[44] Social-Aware Spatial Top-k and Skyline Queries
Sohail, Ammar
Cheema, Muhammad Aamir
Taniar, David
COMPUTER JOURNAL, 2018, 61 (11) : 1620 - 1638
[45] Encoding Two-Dimensional Range Top-k Queries
Jo, Seungbum
Lingala, Rahul
Satti, Srinivasa Rao
ALGORITHMICA, 2021, 83 (11) : 3379 - 3402
[46] Distributed Top-K Join Queries Optimizing for RDF Datasets
Gu, Jinguang
Dong, Hao
Liu, Zhao
Xu, Fangfang
INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2017, 14 (03) : 67 - 83
[47] Evaluating continuous top-k queries over document streams
Rao, Weixiong
Chen, Lei
Chen, Shudong
Tarkoma, Sasu
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2014, 17 (01): : 59 - 83
[48] Top-k Queries for Multi-category RFID Systems
Liu, Xiulong
Li, Keqiu
Wu, Jie
Liu, Alex X.
Xie, Xin
Zhu, Chunsheng
Xue, Weilian
IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
[49] Top-k Subgraph Query Based on Frequent Structure in Large-Scale Dynamic Graphs
Shan, Xiaohuan
Wang, Guangxiang
Ding, Linlin
Song, Baoyan
Xu, Yan
IEEE ACCESS, 2018, 6 : 78471 - 78482
[50] SET: Secure and Efficient Top-k Query in Two-Tiered Wireless Sensor Networks
Zhang, Xiaoying
Peng, Hui
Dong, Lei
Chen, Hong
Sun, Hui
WEB AND BIG DATA, APWEB-WAIM 2017, PT I, 2017, 10366 : 495 - 510

← 1 2 3 4 5 →