A Trie Based Set Similarity Query Algorithm

被引:1
作者
Jia, Lianyin [1 ,2 ]
Tang, Junzhuo [1 ]
Li, Mengjuan [3 ]
Li, Runxin [1 ]
Ding, Jiaman [1 ]
Chen, Yinong [4 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
[3] Yunnan Normal Univ, Lib, Kunming 650500, Peoples R China
[4] Arizona State Univ, Sch Comp & Augmented Intelligence, Tempe, AZ 85287 USA
基金
中国国家自然科学基金;
关键词
set similarity query; T-starTrie; FMNodes; TT-SSQ; EFFICIENT;
D O I
10.3390/math11010229
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Set similarity query is a primitive for many applications, such as data integration, data cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by one and do not have sufficient support for duplicated sets, thus leading to low efficiency. To solve this problem, this paper designs T-starTrie, an efficient trie based index for set similarity query, which can naturally group sets with the same prefix into one node, and can filter all sets corresponding to the node at a time, thereby significantly improving the candidates generation efficiency. In this paper, we find that the set similarity query problem can be transformed into matching nodes of the first-layer (FMNodes) detecting problem on T-starTrie. Therefore, an efficient FLMNode detection algorithm is designed. Based on this, an efficient set similarity query algorithm, TT-SSQ, is implemented by developing a variety of filtering techniques. Experimental results show that TT-SSQ can be up to 3.10x faster than existing algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A level set redistancing algorithm for simulation of two-phase flow
    An, R. D.
    Yu, C. H.
    [J]. NUMERICAL HEAT TRANSFER PART B-FUNDAMENTALS, 2020, 78 (01) : 30 - 53
  • [32] Researching Why-Not Questions in Skyline Query Based on Orthogonal Range
    Sun, Ping
    Liang, Caimei
    Li, Guohui
    Yuan, Ling
    [J]. ELECTRONICS, 2020, 9 (03)
  • [33] Similarity-based constraint score for feature selection
    Salmi, Abderezak
    Hammouche, Kamal
    Macaire, Ludovic
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 209
  • [34] Join query optimization in the distributed database system using an artificial bee colony algorithm and genetic operators
    Panahi, Vahideh
    Navimipour, Nima Jafari
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (17)
  • [35] GPU-Based Parallel Indexing for Concurrent Spatial Query Processing
    Nouri, Zhila
    Tu, Yi-Cheng
    [J]. 30TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2018), 2018,
  • [36] An Index-based Secure Query Processing Scheme for Outsourced Databases
    Akiyama, Kento
    Shinozuka, Chisato
    Watanabe, Chiemi
    Amagasa, Toshiyuki
    Kitagawa, Hiroyuki
    [J]. 19TH INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2017), 2017, : 215 - 223
  • [37] Adaptive Distributed RDF Graph Fragmentation and Allocation based on Query Workload
    Peng, Peng
    Zou, Lei
    Chen, Lei
    Zhao, Dongyan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (04) : 670 - 685
  • [38] Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching
    Wang, Hanchen
    Zhang, Ying
    Qin, Lu
    Wang, Wei
    Zhang, Wenjie
    Lin, Xuemin
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 245 - 258
  • [39] A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications
    Darwish, Saad M.
    Mhaimeed, Ibrahim Abdullah
    Elzoghabi, Adel A.
    [J]. ENTROPY, 2023, 25 (09)
  • [40] A Quantum-Based Database Query Scheme for Privacy Preservation in Cloud Environment
    Liu, Wenjie
    Gao, Peipei
    Liu, Zhihao
    Chen, Hanwu
    Zhang, Maojun
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2019, 2019