Top-k typicality queries and efficient query answering methods on large databases

被引:17
|
作者
Hua, Ming [1 ]
Pei, Jian [1 ]
Fu, Ada W. C. [2 ]
Lin, Xuemin [3 ,4 ]
Leung, Ho-Fung [2 ]
机构
[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada
[2] Chinese Univ Hong Kong, Shatin, Hong Kong, Peoples R China
[3] Univ New S Wales, Sydney, NSW, Australia
[4] NICTA, Sydney, NSW, Australia
来源
VLDB JOURNAL | 2009年 / 18卷 / 03期
基金
澳大利亚研究理事会; 加拿大自然科学与工程研究理事会;
关键词
Typicality analysis; Top-k query; Efficient query answering; CROSS-VALIDATION; DENSITY; SIMILARITY;
D O I
10.1007/s00778-008-0128-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognitive science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. Three types of top-k typicality queries are formulated. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed. Moreover, to answer questions like "Who are the best k typical guards in whole representing different types of guards?", the notion of representative typicality is used. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations: (1) the randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers; (2) the direct local typicality approximation using VP-trees provides an approximation quality guarantee; (3) a local typicality tree data structure can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree. An extensive performance study using two real data sets and a series of synthetic data sets clearly shows that top-k typicality queries are meaningful and our methods are practical.
引用
收藏
页码:809 / 835
页数:27
相关论文
共 50 条
  • [1] Top-k typicality queries and efficient query answering methods on large databases
    Ming Hua
    Jian Pei
    Ada W. C. Fu
    Xuemin Lin
    Ho-Fung Leung
    The VLDB Journal, 2009, 18 : 809 - 835
  • [2] Semantics and evaluation of top-k queries in probabilistic databases
    Zhang, Xi
    Chomicki, Jan
    DISTRIBUTED AND PARALLEL DATABASES, 2009, 26 (01) : 67 - 126
  • [3] A practical approach for efficiently answering top-k relational queries
    Ayanso, Anteneh
    Goes, Paulo B.
    Mehta, Kumar
    DECISION SUPPORT SYSTEMS, 2007, 44 (01) : 326 - 349
  • [4] CLASCN:: Candidate network selection for efficient top-k keyword queries over databases
    Zhang, Jun
    Peng, Zhao-Hui
    Wang, Shan
    Nie, Hui-Jing
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2007, 22 (02) : 197 - 207
  • [5] Answering Top-k Queries over Outsourced Sensitive Data in the Cloud
    Mahboubi, Sakina
    Akbarinia, Reza
    Valduriez, Patrick
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2018, PT I, 2018, 11029 : 218 - 231
  • [6] GStar: an efficient framework for answering top-k star queries on billion-node knowledge graphs
    Jin, Jiahui
    Luo, Junzhou
    Khemmarat, Samamon
    Dong, Fang
    Gao, Lixin
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (04): : 1611 - 1638
  • [7] Top-k Query Processing on Encrypted Databases with Strong Security Guarantees
    Meng, Xianrui
    Zhu, Haohan
    Kollios, George
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 353 - 364
  • [8] Efficient Authentication of Multi-Dimentional Top-k Queries
    Zhu, Xiaoyu
    Wu, Jie
    Chang, Wei
    Wang, Guojun
    Liu, Qin
    IEEE ACCESS, 2019, 7 : 4748 - 4762
  • [9] An Efficient Optimization Approach for Top-k Queries on Uncertain Data
    Zhang, Zhiqiang
    Wei, Xiaoyan
    Xie, Xiaoqin
    Pan, Haiwei
    Miao, Yu
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2018, 27 (01)
  • [10] Efficient and Secure Top-k Queries With Top Order-Preserving Encryption
    Quan, Hanyu
    Wang, Boyang
    Zhang, Yuqing
    Wu, Gaofei
    IEEE ACCESS, 2018, 6 : 31525 - 31540