Scalable nearest neighbor query processing based on Inverted Grid Index

被引:14
作者
Ji, Changqing [1 ,2 ]
Li, Zhiyang [1 ]
Qu, Wenyu [1 ]
Xu, Yujie [1 ]
Li, Yuanyuan [1 ,3 ]
机构
[1] Dalian Maritime Univ, Sch Informat Sci & Technol, Dalian 116026, Peoples R China
[2] Dalian Univ, Sch Phys Sci & Technol, Dalian 116622, Peoples R China
[3] Dalian Jiaotong Univ, Sch Software, Dalian 116028, Peoples R China
基金
美国国家科学基金会;
关键词
Spatial database; NN query; Inverted Grid Index; Big data;
D O I
10.1016/j.jnca.2014.05.010
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing availability of Location-Based Services (LBS) and mobile internet, the amount of spatial data is growing larger. It poses new requirements and challenges for distributed index and query processing on large scale spatial data. A scalable and distributed spatial data index is important for the effective Nearest Neighbor (NN) query. There are several approaches that implement distributed indices and NN query processing with MapReduce, such as R-tree and Voronoi-based index. However, R-tree is unsuitable for parallelization and Voronoi requires extra computation for localization or local index reconstruction. In this paper, we investigate how to perform NN queries in a distributed environment. Firstly, we present distributed approaches that construct a novel distributed spatial data index: Inverted Grid Index, which is a combination of inverted index and grid partition. Secondly, we illustrate the implementations of two typical applications: distributed k Nearest Neighbor (kNN) and Reverse Nearest Neighbor (RNN) queries which are based on our index structure under cloud computing environment. Finally, we evaluate the effectiveness of our algorithms with extensive experiments using both real and synthetic data sets. Our experiments demonstrate that the time of constructing index structure decreases almost linearly as the number of cluster nodes increases. The results also demonstrate the efficiency and scalability of our NN query algorithms based on Inverted Grid Index. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:172 / 182
页数:11
相关论文
共 25 条
[1]  
Akdogan A., 2010, Proceedings of the 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom 2010), P9, DOI 10.1109/CloudCom.2010.92
[2]  
[Anonymous], 2004, P ACM SIGMOD
[3]   The k-Nearest Neighbour Join: Turbo Charging the KDD Process [J].
Boehm, Christian ;
Krebs, Florian .
KNOWLEDGE AND INFORMATION SYSTEMS, 2004, 6 (06) :728-749
[4]  
Borthakur D, 2007, The hadoop distributed file system: Architecture and design
[5]  
Cary A, 2009, LECT NOTES COMPUT SC, V5566, P302, DOI 10.1007/978-3-642-02279-1_24
[6]  
Changqing Ji, 2012, 2012 Seventh ChinaGrid Annual Conference (ChinaGrid 2012), P25, DOI 10.1109/ChinaGrid.2012.19
[7]  
Cheema MA, 2007, LECT NOTES COMPUT SC, V4443, P863
[8]  
Chen YX, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P133
[9]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[10]  
Hasan M, 2010, LECT NOTES COMPUT SC, V5981, P233, DOI 10.1007/978-3-642-12026-8_19