Top-k typicality queries and efficient query answering methods on large databases

被引:17
作者
Hua, Ming [1 ]
Pei, Jian [1 ]
Fu, Ada W. C. [2 ]
Lin, Xuemin [3 ,4 ]
Leung, Ho-Fung [2 ]
机构
[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada
[2] Chinese Univ Hong Kong, Shatin, Hong Kong, Peoples R China
[3] Univ New S Wales, Sydney, NSW, Australia
[4] NICTA, Sydney, NSW, Australia
基金
加拿大自然科学与工程研究理事会; 澳大利亚研究理事会;
关键词
Typicality analysis; Top-k query; Efficient query answering; CROSS-VALIDATION; DENSITY; SIMILARITY;
D O I
10.1007/s00778-008-0128-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognitive science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. Three types of top-k typicality queries are formulated. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed. Moreover, to answer questions like "Who are the best k typical guards in whole representing different types of guards?", the notion of representative typicality is used. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations: (1) the randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers; (2) the direct local typicality approximation using VP-trees provides an approximation quality guarantee; (3) a local typicality tree data structure can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree. An extensive performance study using two real data sets and a series of synthetic data sets clearly shows that top-k typicality queries are meaningful and our methods are practical.
引用
收藏
页码:809 / 835
页数:27
相关论文
共 50 条
  • [41] Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries
    Zou, Lei
    Chen, Lei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (05) : 727 - 741
  • [42] Evaluating continuous top-k queries over document streams
    Weixiong Rao
    Lei Chen
    Shudong Chen
    Sasu Tarkoma
    World Wide Web, 2014, 17 : 59 - 83
  • [43] Sliding-window top-k queries on uncertain streams
    Jin, Cheqing
    Yi, Ke
    Chen, Lei
    Yu, Jeffrey Xu
    Lin, Xuemin
    VLDB JOURNAL, 2010, 19 (03) : 411 - 435
  • [44] Social-Aware Spatial Top-k and Skyline Queries
    Sohail, Ammar
    Cheema, Muhammad Aamir
    Taniar, David
    COMPUTER JOURNAL, 2018, 61 (11) : 1620 - 1638
  • [45] Encoding Two-Dimensional Range Top-k Queries
    Jo, Seungbum
    Lingala, Rahul
    Satti, Srinivasa Rao
    ALGORITHMICA, 2021, 83 (11) : 3379 - 3402
  • [46] Distributed Top-K Join Queries Optimizing for RDF Datasets
    Gu, Jinguang
    Dong, Hao
    Liu, Zhao
    Xu, Fangfang
    INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2017, 14 (03) : 67 - 83
  • [47] Evaluating continuous top-k queries over document streams
    Rao, Weixiong
    Chen, Lei
    Chen, Shudong
    Tarkoma, Sasu
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2014, 17 (01): : 59 - 83
  • [48] Top-k Queries for Multi-category RFID Systems
    Liu, Xiulong
    Li, Keqiu
    Wu, Jie
    Liu, Alex X.
    Xie, Xin
    Zhu, Chunsheng
    Xue, Weilian
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [49] Top-k Subgraph Query Based on Frequent Structure in Large-Scale Dynamic Graphs
    Shan, Xiaohuan
    Wang, Guangxiang
    Ding, Linlin
    Song, Baoyan
    Xu, Yan
    IEEE ACCESS, 2018, 6 : 78471 - 78482
  • [50] SET: Secure and Efficient Top-k Query in Two-Tiered Wireless Sensor Networks
    Zhang, Xiaoying
    Peng, Hui
    Dong, Lei
    Chen, Hong
    Sun, Hui
    WEB AND BIG DATA, APWEB-WAIM 2017, PT I, 2017, 10366 : 495 - 510