Accelerating Progressive Set Similarity Join with the CPU-GPU Architecture

被引:0
|
作者
Yu, Lining [1 ]
Nie, Tiezheng [1 ]
Shen, Derong [1 ]
Kou, Yue [1 ]
机构
[1] Northeastern Univ, Coll Comp Sci & Technol, Shenyang, Peoples R China
基金
中国国家自然科学基金;
关键词
Set similarity join; Progressive; CPU-GPU; Counting bloom filter;
D O I
10.1016/j.bdr.2021.100267
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Set similarity join (SSJoin) is known as an important operation for searching similarity set pairs from the given database and plays a core role in data integration, data cleaning, and data mining. Different from the traditional SSJoin methods, progressive SSJoin aims to resolve large datasets so that the efficiency of finding similarity pairs in the limited running time can be improved. Progressive SSJoin can provide possible partial matching pairs of the dataset as early as possible in the processing. Moreover, many recent researches have shown that GPUs (Graphics Processing Units) can accelerate and improve the efficiency of similarity join operation. This paper focuses on exploring progressive SSJoin algorithms and accelerating them with the CPU-GPU architecture. We propose two progressive SSJoin methods, PSSJM and PBM. PSSJM utilizes inverted indexing and PBM achieves its required functions by utilizing the counting Bloom filter and prefix filtering techniques. In addition, we proposed a GPUs-based algorithm based on our progressive SSJoin method to accelerate the processing. Comprehensive experiments with real-world datasets show that our methods can generate better quality results than the traditional method under limited time and the method implementing on CPU-GPU architecture has high speedups over the CPU-base method. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Parallel String Similarity Join Approach Based on CPU-GPU Heterogeneous Architecture
    Xu K.
    Nie T.
    Shen D.
    Kou Y.
    Yu G.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (03): : 598 - 608
  • [2] Accelerating MapReduce on a Coupled CPU-GPU Architecture
    Chen, Linchuan
    Huo, Xin
    Agrawal, Gagan
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [3] Accelerating Exact Similarity Search on CPU-GPU Systems
    Matsumoto, Takazumi
    Yiu, Man Lung
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 320 - 329
  • [4] HEGJoin: Heterogeneous CPU-GPU Epsilon Grids for Accelerated Distance Similarity Join
    Gallet, Benoit
    Gowanlock, Michael
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT III, 2020, 12114 : 372 - 388
  • [5] Column-Stored System Join Optimization on Coupled CPU-GPU Architecture
    Ding, Xiangwu
    Li, Zitong
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 184 - 191
  • [6] ASW: Accelerating Smith-Waterman Algorithm on Coupled CPU-GPU Architecture
    Zou, Huihui
    Tang, Shanjiang
    Yu, Ce
    Fu, Hao
    Li, Yusen
    Tang, Wenjie
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (03) : 388 - 402
  • [7] Accelerating Pattern Matching with CPU-GPU Collaborative Computing
    Sanz, Victoria
    Pousa, Adrian
    Naiouf, Marcelo
    De Giusti, Armando
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT I, 2018, 11334 : 310 - 322
  • [8] Parallel Graph Partitioning on a CPU-GPU Architecture
    Goodarzi, Bahareh
    Burtscher, Martin
    Goswami, Dhrubajyoti
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 58 - 66
  • [9] CPU-GPU architecture for active noise control
    Kim, Yeongseok
    Park, Youngjin
    APPLIED ACOUSTICS, 2019, 153 : 1 - 13
  • [10] Accelerating Batched Power Flow on Heterogeneous CPU-GPU Platform
    Hao, Jiao
    Zhang, Zongbao
    He, Zonglin
    Liu, Zhengyuan
    Tan, Zhengdong
    Song, Yankan
    ENERGIES, 2024, 17 (24)