P-QALSH plus : Exploiting Multiple Cores to Parallelize Query-Aware Locality-Sensitive Hashing on Big Data

被引:0
作者
Huang, Yikai [1 ]
Hu, Zezhao [1 ]
Feng, Jianlin [1 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
来源
WEB AND BIG DATA, PT II, APWEB-WAIM 2023 | 2024年 / 14332卷
关键词
Approximate Nearest Neighbor Search; Locality-Sensitive Hashing; Parallel Computing; Big Data Mining;
D O I
10.1007/978-981-97-2390-4_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Approximate nearest neighbor (ANN) search in high dimensional Euclidean space is a fundamental problem of big data processing. Locality-Sensitive Hashing (LSH) is a popular scheme to solve the ANN search problem. In the index phase, an LSH scheme needs to preprocess multiple hash tables, and in the query phase it exploits the preprocessed hash tables to speedup the ANN search. Query-Aware LSH (QALSH), a state-of-the-art LSH scheme, has rigorous theoretical guarantee on query accuracy, while suffering from high time overhead in the index and query phase. To improve the query efficiency, a multi-core parallel QALSH scheme called P-QALSH was proposed, which is mainly optimized for the query phase. In this paper, we further extend P-QALSH to P-QALSH+, which parallelizes QALSH in both the index and query phases based on multiple cores. Specifically, we first propose a Parallel Table Design to fully accelerate the index construction. Then, we follow P-QALSH to exploit a novel K-Counter Parallel Counting Technology and a novel Search Radius Estimation Strategy to improve the query performance. Using six real-world datasets and eight synthetic datasets, we have performed extensive experiments on a 16-core machine. Experimental results demonstrate the superiority of P-QALSH+ in terms of efficiency of parallel computing. Specifically, compared to QALSH, P-QALSH+ is 10-12X faster on index construction, and achieves 6-8X speedup on query search, and notably shows obvious improvement in query accuracy.
引用
收藏
页码:28 / 43
页数:16
相关论文
共 20 条
  • [1] Anastasiu D.C., 2019, SIGKDD. KDD 2019
  • [2] Efficient Locality-Sensitive Hashing Over High-Dimensional Data Streams
    Yang, Chengcheng
    Deng, Dong
    Shang, Shuo
    Shao, Ling
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1994 - 1997
  • [3] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [4] New Trends in High-D Vector Similarity Search: AI-driven, Progressive, and Distributed
    Echihabi, Karima
    Zoumpatianos, Kostas
    Palpanas, Themis
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 3198 - 3201
  • [5] High-Dimensional Similarity Search for Scalable Data Science
    Echihabi, Karima
    Zoumpatianos, Kostas
    Palpanas, Themis
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2369 - 2372
  • [6] Gan Junhao, 2012, P ACM SIGMOD INT C M, P541, DOI [10.1145/2213836.2213898, DOI 10.1145/2213836.2213898]
  • [7] Huang Q, 2015, PROC VLDB ENDOW, V9, P1
  • [8] P-QALSH: Parallelizing Query Aware Locality-Sensitive Hashing for Big Data
    Huang, Yikai
    Yao, Zhili
    Feng, Jianlin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 629 - 635
  • [9] Indyk P., 1998, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, P604, DOI 10.1145/276698.276876
  • [10] Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors
    Jafari, Omid
    Nagarkar, Parth
    Montano, Jonathan
    [J]. SIMILARITY SEARCH AND APPLICATIONS, SISAP 2020, 2020, 12440 : 323 - 337