On Group Nearest Group Query Processing

被引:32
作者
Deng, Ke [1 ]
Sadiq, Shazia [1 ]
Zhou, Xiaofang [1 ]
Xu, Hu [2 ]
Fung, Gabriel Pui Cheong [3 ]
Lu, Yansheng [2 ]
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia
[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Hubei, Peoples R China
[3] Arizona State Univ, Tempe, AZ 85287 USA
基金
澳大利亚研究理事会;
关键词
K-median clustering; group nearest group query; group nearest neighbor query; NEIGHBOR QUERIES;
D O I
10.1109/TKDE.2010.230
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a data point set D, a query point set Q, and an integer k, the Group Nearest Group (GNG) query finds a subset omega (vertical bar omega vertical bar <= k) of points from D such that the total distance from all points in Q to the nearest point in omega is not greater than any other subset omega' (vertical bar omega'vertical bar <= k) of points in D. GNG query is a partition-based clustering problem which can be found in many real applications and is NP-hard. In this paper, Exhaustive Hierarchical Combination (EHC) algorithm and Subset Hierarchial Refinement (SHR) algorithm are developed for GNG query processing. While EHC is capable to provide the optimal solution for k = 2, SHR is an efficient approximate approach that combines database techniques with local search heuristic. The processing focus of our approaches is on minimizing the access and evaluation of subsets of cardinality k in D since the number of such subsets is exponentially greater than vertical bar D vertical bar. To do that, the hierarchical blocks of data points at high level are used to find an intermediate solution and then refined by following the guided search direction at low level so as to prune irrelevant subsets. The comprehensive experiments on both real and synthetic data sets demonstrate the superiority of SHR in terms of efficiency and quality.
引用
收藏
页码:295 / 308
页数:14
相关论文
共 23 条
[1]  
Aarts E., 2003, Local search in combinatorial optimization
[2]  
Arya V., 2001, P 33 ACM S THEOR COM
[3]   Searching in high-dimensional spaces -: Index structures for improving the performance of multimedia Databases [J].
Böhm, C ;
Berchtold, S ;
Keim, D .
ACM COMPUTING SURVEYS, 2001, 33 (03) :322-373
[4]  
DENG K, 2007, P 23 IEEE INT C DAT, P23
[5]  
DENG K, 2009, P 25 IEEE INT C DAT
[6]   A data-clustering algorithm on distributed memory multiprocessors [J].
Dhillon, IS ;
Modha, DS .
LARGE-SCALE PARALLEL DATA MINING, 2000, 1759 :245-260
[7]  
ESTER M, 1995, P 1 ACM SIGKDD C KNO
[8]  
Garey M. R., 1979, Computers and intractability. A guide to the theory of NP-completeness
[9]  
Han J., 2001, GEOGRAPHIC DATA MINI, P1
[10]  
Hansen P, 1987, SYSTEMS CITIES FACIL