Top-k typicality queries and efficient query answering methods on large databases

被引:17
|
作者
Hua, Ming [1 ]
Pei, Jian [1 ]
Fu, Ada W. C. [2 ]
Lin, Xuemin [3 ,4 ]
Leung, Ho-Fung [2 ]
机构
[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada
[2] Chinese Univ Hong Kong, Shatin, Hong Kong, Peoples R China
[3] Univ New S Wales, Sydney, NSW, Australia
[4] NICTA, Sydney, NSW, Australia
来源
VLDB JOURNAL | 2009年 / 18卷 / 03期
基金
澳大利亚研究理事会; 加拿大自然科学与工程研究理事会;
关键词
Typicality analysis; Top-k query; Efficient query answering; CROSS-VALIDATION; DENSITY; SIMILARITY;
D O I
10.1007/s00778-008-0128-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognitive science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. Three types of top-k typicality queries are formulated. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed. Moreover, to answer questions like "Who are the best k typical guards in whole representing different types of guards?", the notion of representative typicality is used. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations: (1) the randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers; (2) the direct local typicality approximation using VP-trees provides an approximation quality guarantee; (3) a local typicality tree data structure can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree. An extensive performance study using two real data sets and a series of synthetic data sets clearly shows that top-k typicality queries are meaningful and our methods are practical.
引用
收藏
页码:809 / 835
页数:27
相关论文
共 50 条
  • [31] Efficient framework for processing top-k queries with replication in mobile ad hoc networks
    Sasaki, Yuya
    Hara, Takahiro
    Ishikawa, Yoshiharu
    GEOINFORMATICA, 2019, 23 (04) : 591 - 620
  • [32] A Toolkit for Managing Multiple Crowdsourced Top-K Queries
    Shan, Caihua
    Hou, Leong U.
    Mamoulis, Nikos
    Cheng, Reynold
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3453 - 3456
  • [33] Optimizing Distributed Top-k Queries on Uncertain Data
    Zhao Zhibin
    Yu Yang
    Bao Yubin
    Yu Ge
    2013 25TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2013, : 3209 - 3214
  • [34] Uncertain top-k query processing in distributed environments
    Wang, Xite
    Shen, Derong
    Yu, Ge
    DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (04) : 567 - 589
  • [35] Top-k Query for Weighted Interactive Product Configuration
    Chen, Baijun
    Feng, Tao
    2018 17TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES), 2018, : 326 - 331
  • [36] An Approximate Top-k Query Algorithm in Distributed Network
    Li, Wenhua
    Yu, Wenting
    Xiao, Feng
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NATURAL COMPUTING, VOL II, 2009, : 417 - 420
  • [37] Scalable top-k keyword search in relational databases
    Yanwei Xu
    Cluster Computing, 2019, 22 : 731 - 747
  • [38] Scalable top-k keyword search in relational databases
    Xu, Yanwei
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 731 - 747
  • [39] Approximate top-K answering under uncertain schema mappings
    Li, Longzhuang
    Tian, Feng
    Liu, Yonghuai
    Mao, Shanxian
    DATA & KNOWLEDGE ENGINEERING, 2018, 118 : 71 - 91
  • [40] Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries
    Zou, Lei
    Chen, Lei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (05) : 727 - 741