Top-k typicality queries and efficient query answering methods on large databases

被引:17
|
作者
Hua, Ming [1 ]
Pei, Jian [1 ]
Fu, Ada W. C. [2 ]
Lin, Xuemin [3 ,4 ]
Leung, Ho-Fung [2 ]
机构
[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada
[2] Chinese Univ Hong Kong, Shatin, Hong Kong, Peoples R China
[3] Univ New S Wales, Sydney, NSW, Australia
[4] NICTA, Sydney, NSW, Australia
来源
VLDB JOURNAL | 2009年 / 18卷 / 03期
基金
澳大利亚研究理事会; 加拿大自然科学与工程研究理事会;
关键词
Typicality analysis; Top-k query; Efficient query answering; CROSS-VALIDATION; DENSITY; SIMILARITY;
D O I
10.1007/s00778-008-0128-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognitive science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. Three types of top-k typicality queries are formulated. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed. Moreover, to answer questions like "Who are the best k typical guards in whole representing different types of guards?", the notion of representative typicality is used. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations: (1) the randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers; (2) the direct local typicality approximation using VP-trees provides an approximation quality guarantee; (3) a local typicality tree data structure can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree. An extensive performance study using two real data sets and a series of synthetic data sets clearly shows that top-k typicality queries are meaningful and our methods are practical.
引用
收藏
页码:809 / 835
页数:27
相关论文
共 50 条
  • [21] Efficient Top-k Query Processing in Mobile Ad Hoc Networks
    Hara, Takahiro
    Hagihara, Ryo
    Sasaki, Yuya
    Shinohara, Masako
    Nishio, Shojiro
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2010, 1 (02): : 193 - 210
  • [22] Efficient Top-k Query Processing Algorithms in Highly Distributed Environments
    Fang, Qiming
    Yang, Guangwen
    JOURNAL OF COMPUTERS, 2014, 9 (09) : 2000 - 2006
  • [23] Monochromatic and Bichromatic Reverse Top-k Queries
    Vlachou, Akrivi
    Doulkeridis, Christos
    Kotidis, Yannis
    Norvag, Kjetil
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (08) : 1215 - 1229
  • [24] A SURVEY ON TOP-K QUERY PROCESSING IN MANETs
    Mohanapriya, T.
    Ranganathan, S. Raja
    Karthik, S.
    PROCEEDINGS OF 2017 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO 2017), 2017, : 480 - 484
  • [25] Effective and efficient top-k query processing over incomplete data streams
    Ren, Weilong
    Lian, Xiang
    Ghazinour, Kambiz
    INFORMATION SCIENCES, 2021, 544 : 343 - 371
  • [26] An efficient massive data retrieval algorithm based on modified Top-k query
    Peng, Xiao
    2015 SEVENTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2015), 2015, : 102 - 105
  • [27] Efficient and Fast Distributed Top-k Query Protocol in Wireless Sensor Networks
    Tang, Shao-Jie
    Mao, Xufei
    Li, Xiang-Yang
    2011 19TH IEEE INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (ICNP), 2011,
  • [28] A Cost-Based Range Estimation for Mapping Top-k Selection Queries over Relational Databases
    Ayanso, Anteneh
    Goes, Paulo B.
    Mehta, Kumar
    JOURNAL OF DATABASE MANAGEMENT, 2009, 20 (04) : 1 - 25
  • [29] Efficient Verifiable Top-k Queries in Two-tiered Wireless Sensor Networks
    Dai, Hua
    Yang, Geng
    Huang, Haiping
    Xiao, Fu
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2015, 9 (06): : 2111 - 2131
  • [30] Efficient framework for processing top-k queries with replication in mobile ad hoc networks
    Yuya Sasaki
    Takahiro Hara
    Yoshiharu Ishikawa
    GeoInformatica, 2019, 23 : 591 - 620