HiSpatialCluster: A novel high-performance software tool for clustering massive spatial points

被引:13
作者
Chen, Yiran [1 ,2 ]
Huang, Zhou [1 ,2 ]
Pei, Tao [3 ]
Liu, Yu [1 ,2 ]
机构
[1] Peking Univ, Inst Remote Sensing & Geog Informat Syst, Beijing 100871, Peoples R China
[2] Peking Univ, Beijing Key Lab Spatial Informat Integrat & Its A, Beijing 100871, Peoples R China
[3] Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing 100101, Peoples R China
基金
中国国家自然科学基金;
关键词
ALGORITHM;
D O I
10.1111/tgis.12463
中图分类号
P9 [自然地理学]; K9 [地理];
学科分类号
0705 ; 070501 ;
摘要
In the era of big data, spatial clustering is a very important means for geo-data analysis. When clustering big geo-data such as social media check-in data, geotagged photos, and taxi trajectory points, traditional spatial clustering algorithms are facing more challenges. On the one hand, existing spatial clustering tools cannot support the clustering of massive point sets; on the other hand, there is no perfect solution for self-adaptive spatial clustering. In order to achieve clustering of millions or even billions of points adaptively, a new spatial clustering toolHiSpatialClusterwas proposed, in which the CFSFDP (clustering by fast search and finding density peaks) idea to find cluster centers and the DBSCAN (density-based spatial clustering of applications with noise) idea of density-connect filtering for classification are introduced. The tool's source codes and other resources have been released on Github, and experimental evaluation was performed through clustering massive taxi trajectory points and Flickr geotagged photos in Beijing, China. The spatial clustering results were compared with those through K-means and DBSCAN as well. As a spatial clustering tool, HiSpatialCluster is expected to play a fundamental role in big geo-data research. First, this tool enables clustering adaptively on massive point datasets with uneven spatial density distribution. Second, the density-connect filter method is applied to generate homogeneous analysis units from geotagged data. Third, the tool is accelerated by both parallel CPU and GPU computing so that millions or even billions of points can be clustered efficiently.
引用
收藏
页码:1275 / 1298
页数:24
相关论文
共 31 条
[1]  
Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
[2]  
[Anonymous], 2010, P 1 INT C EXH COMP G
[3]   ST-DBSCAN: An algorithm for clustering spatial-temp oral data [J].
Birant, Derya ;
Kut, Alp .
DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) :208-221
[4]   An adaptive spatial clustering algorithm based on delaunay triangulation [J].
Deng, Min ;
Liu, Qiliang ;
Cheng, Tao ;
Shi, Yan .
COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2011, 35 (04) :320-332
[5]  
Ertöz L, 2003, SIAM PROC S, P47
[6]  
Ester M., 1996, P 2 INT C KNOWL DISC, V96, P226
[7]   Multi-level clustering and its visualization for exploratory spatial analysis [J].
Estivill-Castro, V ;
Lee, I .
GEOINFORMATICA, 2002, 6 (02) :123-152
[8]  
Estivill-Castro V., 2002, Computers, Environment and Urban Systems, V26, P315, DOI 10.1016/S0198-9715(01)00044-8
[9]   Cure: An efficient clustering algorithm for large databases [J].
Guha, S ;
Rastogi, R ;
Shim, K .
INFORMATION SYSTEMS, 2001, 26 (01) :35-58
[10]   Discovering Spatial Patterns in Origin-Destination Mobility Data [J].
Guo, Diansheng ;
Zhu, Xi ;
Jin, Hai ;
Gao, Peng ;
Andris, Clio .
TRANSACTIONS IN GIS, 2012, 16 (03) :411-429