An Efficient Algorithm for Spatio-Textual Object Cluster Join

被引:0
作者
Chen, Mingming [1 ]
Wang, Ning [1 ,2 ]
Zhu, Daxin [3 ]
Shang, Jedi S. [4 ]
机构
[1] Xiamen Huaxia Univ, Coll Informat & Smart Electromech Engn, Informat Commun Technol & Smart Educ Fujian Engn, Xiamen, Fujian, Peoples R China
[2] Xiamen Univ, Dept Automat, Xiamen, Fujian, Peoples R China
[3] Fujian Prov Key Lab Data Intens Comp, Quanzhou, Peoples R China
[4] Thinvent Digital Technol CO LTD, Nanchang, Jiangxi, Peoples R China
关键词
Cluster; Spatio-textual; Similarity join; SEARCH;
D O I
10.1016/j.bdr.2021.100191
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the proliferation of GPS-based equipments and location-based services, spatio-textual objects have been playing an indispensable role in spatial data management. It is of great importance to enable the join operation among spatio-textual object groups. In this paper, we propose to study a novel problem of spatio-textual object cluster join (STOC-Join). Given two sets of spatio-textual objects D-1 and D-2 and a similarity threshold theta, the STOC-Join problem finds all object cluster pairs whose spatio-textual similarities are no less than theta. The problem of STOC-Join is practical in a variety of application scenarios, including location-based event detection, location-based data cleaning, and location-based social media data pre-processing in general. Efficient processing of STOC-Join is challenging in the following three aspects: (1) How to define and compute the spatio-textual similarity between two clusters of spatiotextual objects effectively; (2) How to efficiently cluster a large number of spatio-textual objects; (3) How to efficiently find similar cluster pairs and filter out unqualified pair candidates. To address the challenges, we define an effective and easy-to-compute similarity metric that measures the aggregated similarities between two groups of spatio-textual objects. Based on the similarity metric, we propose a novel two-phase matching algorithm that is able to cluster a large number of spatio-textual objects and find all cluster pairs efficiently. Our experiments on large real-life datasets confirm that our proposed two-phase matching algorithm is capable of achieving high efficiency compared with straightforward methods. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页数:6
相关论文
共 28 条
[1]   Keyword-aware Optimal Route Search [J].
Cao, Xin ;
Chen, Lisi ;
Cong, Gao ;
Xiaokui Xiao .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (11) :1136-1147
[2]  
Cao X, 2013, PROC INT CONF DATA, P1340, DOI 10.1109/ICDE.2013.6544939
[3]   Top-kterm publish/subscribe for geo-textual data streams [J].
Chen, Lisi ;
Shang, Shuo ;
Jensen, Christian S. ;
Xu, Jianliang ;
Kalnis, Panos ;
Yao, Bin ;
Shao, Ling .
VLDB JOURNAL, 2020, 29 (05) :1101-1128
[4]   Spatial keyword search: a survey [J].
Chen, Lisi ;
Shang, Shuo ;
Yang, Chengcheng ;
Li, Jing .
GEOINFORMATICA, 2020, 24 (01) :85-106
[5]   Effective and Efficient Reuse of Past Travel Behavior for Route Recommendation [J].
Chen, Lisi ;
Shang, Shuo ;
Jensen, Christian S. ;
Yao, Bin ;
Zhang, Zhiwei ;
Shao, Ling .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :488-498
[6]  
Chen LS, 2019, AAAI CONF ARTIF INTE, P873
[7]   Spatio-temporal top-k term search over sliding window [J].
Chen, Lisi ;
Shang, Shuo ;
Yao, Bin ;
Zheng, Kai .
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (05) :1953-1970
[8]   Cluster-based Subscription Matching for Geo-Textual Data Streams [J].
Chen, Lisi ;
Shang, Shuo ;
Zheng, Kai ;
Kalnis, Panos .
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, :890-901
[9]   Location-Aware Top-k Term Publish/Subscribe [J].
Chen, Lisi ;
Shang, Shuo ;
Zhang, Zhiwei ;
Cao, Xin ;
Jensen, Christian S. ;
Kalnis, Panos .
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, :749-760
[10]  
Chen LS, 2015, PROC INT CONF DATA, P255, DOI 10.1109/ICDE.2015.7113289