A Prefix-Filter based Method for Spatio-Textual Similarity Join

被引:14
|
作者
Liu, Sitong [1 ]
Li, Guoliang [1 ]
Feng, Jianhua [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol TNList, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Spatio-textual objects; similarity join; MBR prefix; hybrid signature;
D O I
10.1109/TKDE.2013.83
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Location-based services have attracted significant attention due to modern mobile phones equipped with GPS devices. These services generate large amounts of spatio-textual data which contain both spatial location and textual descriptions. Since a spatio-textual object may have different representations, possibly because of deviations of GPS or different user descriptions, it calls for efficient methods to integrate spatio-textual data from different sources. In this paper we study a new research problem called spatio-textual similarity join: given two sets of spatio-textual objects, find the similar object pairs. We make the following contributions: (1) We develop a filter-and-refine framework and devise several efficient algorithms. We extend the prefix filter technique to generate spatial and textual signatures for the objects and build inverted index on top of these signatures. Then we generate candidate pairs using the inverted lists of signatures. Finally we refine the candidates and generate the final result. (2) We study how to generate high-quality signatures for spatial information. We develop an MBR-prefix based signature to prune large numbers of dissimilar object pairs. (3) We propose a hybrid signature scheme to support both textual pruning and spatial pruning simultaneously. (4) Experimental results on real and synthetic datasets show that our algorithms achieve high performance and scale well.
引用
收藏
页码:2354 / 2367
页数:14
相关论文
共 8 条
  • [1] Top-k Spatio-Textual Similarity Join
    Hu, Huiqi
    Li, Guoliang
    Bao, Zhifeng
    Feng, Jianhua
    Wu, Yongwei
    Gong, Zhiguo
    Xu, Yaoqiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) : 551 - 565
  • [2] An Efficient Block Index Scheme with Segmentation for Spatio-Textual Similarity Join
    Xiang, Yiming
    Zhuang, Yi
    Jiang, Nan
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2017, 11 (07): : 3578 - 3593
  • [3] Privacy-Preserving Top-k Spatio-Textual Similarity Join
    Teng, Yiping
    Jiang, Dongyue
    Sun, Mengmeng
    Zhao, Liang
    Xu, Li
    Fan, Chunlong
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 718 - 726
  • [4] An Efficient Algorithm for Spatio-Textual Object Cluster Join
    Chen, Mingming
    Wang, Ning
    Zhu, Daxin
    Shang, Jedi S.
    BIG DATA RESEARCH, 2021, 25
  • [5] How improve Set Similarity Join based on prefix approach in distributed environment
    Zhu, Song
    Gagliardelli, Luca
    Simonini, Giovanni
    Beneventano, Domenico
    PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2018, : 844 - 851
  • [6] Refining High-frequency-queries-based Filter for Similarity Join
    Chongstitvatana, Jaruloj
    Thitinanrungkit, Natthee
    2015 INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC), 2015, : 68 - 72
  • [7] A Generic Method for Accelerating LSH-Based Similarity Join Processing
    Yu, Chenyun
    Nutanong, Sarana
    Li, Hangyu
    Wang, Cong
    Yuan, Xingliang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (04) : 712 - 726
  • [8] Finding a Set of High-frequency Queries for High-frequency-query-based Filter for Similarity Join
    Kunanusont, Kamolwan
    Chongstitvatana, Jaruloj
    2015 12TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2015,