Top-k typicality queries and efficient query answering methods on large databases

被引：17

作者：

Hua, Ming ^{[1
]}

Pei, Jian ^{[1
]}

Fu, Ada W. C. ^{[2
]}

Lin, Xuemin ^{[3
,4
]}

Leung, Ho-Fung ^{[2
]}

机构：

[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada

[2] Chinese Univ Hong Kong, Shatin, Hong Kong, Peoples R China

[3] Univ New S Wales, Sydney, NSW, Australia

[4] NICTA, Sydney, NSW, Australia

来源：

VLDB JOURNAL | 2009年 / 18卷 / 03期

基金：

澳大利亚研究理事会; 加拿大自然科学与工程研究理事会;

关键词：

Typicality analysis; Top-k query; Efficient query answering; CROSS-VALIDATION; DENSITY; SIMILARITY;

D O I：

10.1007/s00778-008-0128-8

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognitive science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. Three types of top-k typicality queries are formulated. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed. Moreover, to answer questions like "Who are the best k typical guards in whole representing different types of guards?", the notion of representative typicality is used. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations: (1) the randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers; (2) the direct local typicality approximation using VP-trees provides an approximation quality guarantee; (3) a local typicality tree data structure can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree. An extensive performance study using two real data sets and a series of synthetic data sets clearly shows that top-k typicality queries are meaningful and our methods are practical.

引用

页码：809 / 835

页数：27

共 50 条

[31] Efficient framework for processing top-k queries with replication in mobile ad hoc networks
Sasaki, Yuya
Hara, Takahiro
Ishikawa, Yoshiharu
GEOINFORMATICA, 2019, 23 (04) : 591 - 620
[32] A Toolkit for Managing Multiple Crowdsourced Top-K Queries
Shan, Caihua
Hou, Leong U.
Mamoulis, Nikos
Cheng, Reynold
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3453 - 3456
[33] Optimizing Distributed Top-k Queries on Uncertain Data
Zhao Zhibin
Yu Yang
Bao Yubin
Yu Ge
2013 25TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2013, : 3209 - 3214
[34] Uncertain top-k query processing in distributed environments
Wang, Xite
Shen, Derong
Yu, Ge
DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (04) : 567 - 589
[35] Top-k Query for Weighted Interactive Product Configuration
Chen, Baijun
Feng, Tao
2018 17TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES), 2018, : 326 - 331
[36] An Approximate Top-k Query Algorithm in Distributed Network
Li, Wenhua
Yu, Wenting
Xiao, Feng
PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NATURAL COMPUTING, VOL II, 2009, : 417 - 420
[37] Scalable top-k keyword search in relational databases
Yanwei Xu
Cluster Computing, 2019, 22 : 731 - 747
[38] Scalable top-k keyword search in relational databases
Xu, Yanwei
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 731 - 747
[39] Approximate top-K answering under uncertain schema mappings
Li, Longzhuang
Tian, Feng
Liu, Yonghuai
Mao, Shanxian
DATA & KNOWLEDGE ENGINEERING, 2018, 118 : 71 - 91
[40] Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries
Zou, Lei
Chen, Lei
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (05) : 727 - 741

← 1 2 3 4 5 →