A Trie Based Set Similarity Query Algorithm

被引:1
|
作者
Jia, Lianyin [1 ,2 ]
Tang, Junzhuo [1 ]
Li, Mengjuan [3 ]
Li, Runxin [1 ]
Ding, Jiaman [1 ]
Chen, Yinong [4 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
[3] Yunnan Normal Univ, Lib, Kunming 650500, Peoples R China
[4] Arizona State Univ, Sch Comp & Augmented Intelligence, Tempe, AZ 85287 USA
基金
中国国家自然科学基金;
关键词
set similarity query; T-starTrie; FMNodes; TT-SSQ; EFFICIENT;
D O I
10.3390/math11010229
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Set similarity query is a primitive for many applications, such as data integration, data cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by one and do not have sufficient support for duplicated sets, thus leading to low efficiency. To solve this problem, this paper designs T-starTrie, an efficient trie based index for set similarity query, which can naturally group sets with the same prefix into one node, and can filter all sets corresponding to the node at a time, thereby significantly improving the candidates generation efficiency. In this paper, we find that the set similarity query problem can be transformed into matching nodes of the first-layer (FMNodes) detecting problem on T-starTrie. Therefore, an efficient FLMNode detection algorithm is designed. Based on this, an efficient set similarity query algorithm, TT-SSQ, is implemented by developing a variety of filtering techniques. Experimental results show that TT-SSQ can be up to 3.10x faster than existing algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] Set Similarity Joins on MapReduce: An Experimental Survey
    Fier, Fabian
    Augsten, Nikolaus
    Bouros, Panagiotis
    Leser, Ulf
    Freytag, Johann-Christoph
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (10): : 1110 - 1122
  • [12] ETI: an efficient index for set similarity queries
    Lianyin Jia
    Jianqing Xi
    Mengjuan Li
    Yong Liu
    Decheng Miao
    Frontiers of Computer Science, 2012, 6 : 700 - 712
  • [13] ETI: an efficient index for set similarity queries
    Jia, Lianyin
    Xi, Jianqing
    Li, Mengjuan
    Liu, Yong
    Miao, Decheng
    FRONTIERS OF COMPUTER SCIENCE, 2012, 6 (06) : 700 - 712
  • [14] A novel approach for high-dimensional vector similarity join query
    Ma, Youzhong
    Jia, Shijie
    Zhang, Yongxin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (05)
  • [15] Modified set partitioning in hierarchical trees algorithm based on hierarchical subbands
    Ochoa Dominguez, Humberto de J.
    Vergara Villegas, Osslan O.
    Cruz Sanchez, Vianey G.
    JOURNAL OF ELECTRONIC IMAGING, 2015, 24 (03)
  • [16] Subgraph Matching with Set Similarity in a Large Graph Database
    Hong, Liang
    Zou, Lei
    Lian, Xiang
    Yu, Philip S.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (09) : 2507 - 2521
  • [17] A Novel Image Compression Algorithm Based on the Biorthogonal Invariant Set Multiwavelet
    Li, Yongjun
    Li, Yunsong
    Liu, Weijia
    SATELLITE DATA COMPRESSION, COMMUNICATIONS, AND PROCESSING XI, 2015, 9501
  • [18] Performance Enhanced Secure Spatial Keyword Similarity Query With Arbitrary Spatial Ranges
    Zhang, Songnian
    Lu, Rongxing
    Zhu, Hui
    Zheng, Yandong
    Guan, Yunguo
    Wang, Fengwei
    Shao, Jun
    Li, Hui
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 5272 - 5285
  • [19] A Two -Level Signature Scheme for Stable Set Similarity Joins
    Schmitt, Daniel
    Kocher, Daniel
    Augsten, Nikolaus
    Mann, Willi
    Miller, Alexander
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (11): : 2686 - 2698
  • [20] A New Angle Set-Based Absolute Scaling-free Reconfigurable Cordic Algorithm
    Changela, Ankur
    Zaveri, Mazad
    Kumar, Yogesh
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (12) : 7404 - 7432