Fast SNN-Based Clustering Approach for Large Geospatial Data Sets

被引:6
作者
Antunes, Armenio [1 ]
Santos, Maribel Yasmina [1 ]
Moreira, Adriano [1 ]
机构
[1] Univ Minho, ALGORITMI Res Ctr, Guimaraes, Portugal
来源
CONNECTING A DIGITAL EUROPE THROUGH LOCATION AND PLACE | 2014年
关键词
TREES;
D O I
10.1007/978-3-319-03611-3_11
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current positioning and sensing technologies enable the collection of very large spatio-temporal data sets. When analysing movement data, researchers often resort to clustering techniques to extract useful patterns from these data. Density-based clustering algorithms, although being very adequate to the analysis of this type of data, can be very inefficient when analysing huge amounts of data. The Shared Nearest Neighbour (SNN) algorithm presents low efficiency when dealing with high quantities of data due to its complexity evaluated in the worst case by O(n(2)). This chapter presents a clustering method, based on the SNN algorithm that significantly reduces the processing time by segmenting the spatial dimension of the data into a set of cells, and by minimizing the number of cells that have to be visited while searching for the k-nearest neighbours of each vector. The obtained results show an expressive reduction of the time needed to find the k-nearest neighbours and to compute the clusters, while producing results that are equal to those produced by the original SNN algorithm. Experimental results obtained with three different data sets (2D and 3D), one synthetic and two real, show that the proposed method enables the analysis of much larger data sets within reasonable amount of time.
引用
收藏
页码:179 / 195
页数:17
相关论文
共 13 条
[1]  
[Anonymous], KDD
[2]  
[Anonymous], 2003, P 3 SIAM INT C DAT M
[3]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[4]  
Bhavsar Hetal Bharat, 2009, 2009 WRI World Congress on Computer Science and Information Engineering, CSIE, P436, DOI 10.1109/CSIE.2009.997
[5]   Searching in metric spaces [J].
Chávez, E ;
Navarro, G ;
BaezaYates, R ;
Marroquín, JL .
ACM COMPUTING SURVEYS, 2001, 33 (03) :273-321
[6]  
ERTOZ L, 2002, P WORKSH CLUST HIGH, P105
[7]  
Faustino B, 2012, THESIS NEW U LISBON
[8]   CLUSTERING USING A SIMILARITY MEASURE BASED ON SHARED NEAR NEIGHBORS [J].
JARVIS, RA ;
PATRICK, EA .
IEEE TRANSACTIONS ON COMPUTERS, 1973, C-22 (11) :1025-1034
[9]   A Novel Fast Clustering Algorithm [J].
Li Xia ;
Jiang Sheng-yi ;
Su Xiao-ke .
2009 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, VOL IV, PROCEEDINGS, 2009, :284-+
[10]   The Impact of Data Quality in the Context of Pedestrian Movement Analysis [J].
Moreira, Adriano ;
Santos, Maribel Yasmina ;
Wachowicz, Monica ;
Orellana, Daniel .
GEOSPATIAL THINKING, 2010, :61-+