A Trie Based Set Similarity Query Algorithm

被引:1
|
作者
Jia, Lianyin [1 ,2 ]
Tang, Junzhuo [1 ]
Li, Mengjuan [3 ]
Li, Runxin [1 ]
Ding, Jiaman [1 ]
Chen, Yinong [4 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
[3] Yunnan Normal Univ, Lib, Kunming 650500, Peoples R China
[4] Arizona State Univ, Sch Comp & Augmented Intelligence, Tempe, AZ 85287 USA
基金
中国国家自然科学基金;
关键词
set similarity query; T-starTrie; FMNodes; TT-SSQ; EFFICIENT;
D O I
10.3390/math11010229
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Set similarity query is a primitive for many applications, such as data integration, data cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by one and do not have sufficient support for duplicated sets, thus leading to low efficiency. To solve this problem, this paper designs T-starTrie, an efficient trie based index for set similarity query, which can naturally group sets with the same prefix into one node, and can filter all sets corresponding to the node at a time, thereby significantly improving the candidates generation efficiency. In this paper, we find that the set similarity query problem can be transformed into matching nodes of the first-layer (FMNodes) detecting problem on T-starTrie. Therefore, an efficient FLMNode detection algorithm is designed. Based on this, an efficient set similarity query algorithm, TT-SSQ, is implemented by developing a variety of filtering techniques. Experimental results show that TT-SSQ can be up to 3.10x faster than existing algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] An efficient length-segmented inverted index-based set similarity query algorithm
    Li, Mengjuan
    Jia, Lianyin
    Hu, Juntao
    Zhang, Ruiqi
    Wei, Shoulin
    Pan, Mengni
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2022, 16 (01) : 85 - 95
  • [2] Similarity Query Processing for Probabilistic Sets
    Gao, Ming
    Jin, Cheqing
    Wang, Wei
    Lin, Xuemin
    Zhou, Aoying
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 913 - 924
  • [3] A Transformation-Based Framework for KNN Set Similarity Search
    Zhang, Yong
    Wu, Jiacheng
    Wang, Jin
    Xing, Chunxiao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (03) : 409 - 423
  • [4] Point Cloud Registration Algorithm Based on Cosine Similarity
    Zhan Xu
    Cai Yong
    LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (12)
  • [5] A nearest neighbor query method for searching objects with time and location informations based on spatiotemporal similarity
    Qian, Shenyi
    Tian, Ziqiao
    EVOLUTIONARY INTELLIGENCE, 2024, 17 (04) : 3031 - 3041
  • [6] Data structure set-trie for storing and querying sets: Theoretical and empirical analysis
    Savnik, Iztok
    Akulich, Mikita
    Krnc, Matjaz
    Skrekovski, Riste
    PLOS ONE, 2021, 16 (02):
  • [7] Window-based multiple continuous query algorithm for data streams
    Liu, Wen
    Zhang, Tuqian
    Liu, Junxia
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (09) : 5782 - 5807
  • [8] KSQ: Top-k Similarity Query on Uncertain Trajectories
    Ma, Chunyang
    Lu, Hua
    Shou, Lidan
    Chen, Gang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (09) : 2049 - 2062
  • [9] CP-Trie: Cumulative PopCount based Trie for IPv6 Routing Table Lookup in Software and ASIC
    Islam, Md Iftakharul
    Khan, Javed, I
    2021 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING (IEEE HPSR), 2021,
  • [10] Parallel set similarity join on big data based on Locality-Sensitive Hashing
    Sohrabi, Mohammad Karim
    Azgomi, Hosseion
    SCIENCE OF COMPUTER PROGRAMMING, 2017, 145 : 1 - 12