A Trie Based Set Similarity Query Algorithm

被引:1
作者
Jia, Lianyin [1 ,2 ]
Tang, Junzhuo [1 ]
Li, Mengjuan [3 ]
Li, Runxin [1 ]
Ding, Jiaman [1 ]
Chen, Yinong [4 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
[3] Yunnan Normal Univ, Lib, Kunming 650500, Peoples R China
[4] Arizona State Univ, Sch Comp & Augmented Intelligence, Tempe, AZ 85287 USA
基金
中国国家自然科学基金;
关键词
set similarity query; T-starTrie; FMNodes; TT-SSQ; EFFICIENT;
D O I
10.3390/math11010229
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Set similarity query is a primitive for many applications, such as data integration, data cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by one and do not have sufficient support for duplicated sets, thus leading to low efficiency. To solve this problem, this paper designs T-starTrie, an efficient trie based index for set similarity query, which can naturally group sets with the same prefix into one node, and can filter all sets corresponding to the node at a time, thereby significantly improving the candidates generation efficiency. In this paper, we find that the set similarity query problem can be transformed into matching nodes of the first-layer (FMNodes) detecting problem on T-starTrie. Therefore, an efficient FLMNode detection algorithm is designed. Based on this, an efficient set similarity query algorithm, TT-SSQ, is implemented by developing a variety of filtering techniques. Experimental results show that TT-SSQ can be up to 3.10x faster than existing algorithms.
引用
收藏
页数:13
相关论文
共 50 条
[41]   A Comparison of the Spatial Arrangement Method and the Total-Set Pairwise Rating Method for Obtaining Similarity Data in the Conceptual Domain [J].
Verheyen, Steven ;
White, Anne ;
Storms, Gert .
MULTIVARIATE BEHAVIORAL RESEARCH, 2022, 57 (2-3) :356-384
[42]   Multi-Objective Decision Support for Irrigation Systems Based on Skyline Query [J].
Loh, Chee-Hoe ;
Chen, Yi-Chung ;
Su, Chwen-Tzeng ;
Lin, Sheng-Hao .
APPLIED SCIENCES-BASEL, 2024, 14 (03)
[43]   Pareto Navigation Gradient Descent: a First-Order Algorithm for Optimization in Pareto Set [J].
Ye, Mao ;
Liu, Qiang .
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 :2246-2255
[44]   Accurate Range Query With Privacy Preservation for Outsourced Location-Based Service in IoT [J].
Liu, Zhaoman ;
Wu, Lei ;
Meng, Weizhi ;
Wang, Hao ;
Wang, Wei .
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (18) :14322-14337
[45]   A Line Graph-Based Continuous Range Query Method for Moving Objects in Networks [J].
Zhang, Hengcai ;
Lu, Feng ;
Chen, Jie .
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2016, 5 (12)
[46]   Secure k-NN Query With Multiple Keys Based on Random Projection Forests [J].
Zhang, Yunzhen ;
Wang, Baocang ;
Zhao, Zhen .
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (09) :15205-15218
[47]   Hybrid level set method based on image diffusion [J].
Wang, Xiao-Feng ;
Zou, Le ;
Xu, Li-Xiang ;
Lv, Gang ;
Tang, Chao .
NEUROCOMPUTING, 2017, 228 :53-64
[48]   Factor Windows: Cost-based Query Rewriting for Optimizing Correlated Window Aggregates [J].
Wu, Wentao ;
Bernstein, Philip A. ;
Raizman, Alex ;
Pavlopoulou, Christina .
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, :2722-2734
[49]   G-PICS: A Framework for GPU-Based Spatial Indexing and Query Processing [J].
Lewis, Zhila-Nouri ;
Tu, Yi-Cheng .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (03) :1243-1257
[50]   SimRank*: effective and scalable pairwise similarity search based on graph topology [J].
Yu, Weiren ;
Lin, Xuemin ;
Zhang, Wenjie ;
Pei, Jian ;
McCann, Julie A. .
VLDB JOURNAL, 2019, 28 (03) :401-426