Locality-Sensitive Hashing for Information Retrieval System on Multiple GPGPU Devices

被引:7
作者
Toan Nguyen Mau [1 ]
Inoguchi, Yasushi [2 ]
机构
[1] Japan Adv Inst Sci & Technol, Grad Sch Informat Sci, Inoguchi Lab, 1-1 Asahidai, Nomi, Ishikawa 9231211, Japan
[2] Japan Adv Inst Sci & Technol, Res Ctr Adv Comp Infrastruct, 1-1 Asahidai, Nomi, Ishikawa 9231211, Japan
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 07期
关键词
locality-sensitive hashing; structured data set; GPGPU; similarity searching; parallel processing; distributed memory; NEAREST-NEIGHBOR;
D O I
10.3390/app10072539
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
It is challenging to build a real-time information retrieval system, especially for systems with high-dimensional big data. To structure big data, many hashing algorithms that map similar data items to the same bucket to advance the search have been proposed. Locality-Sensitive Hashing (LSH) is a common approach for reducing the number of dimensions of a data set, by using a family of hash functions and a hash table. The LSH hash table is an additional component that supports the indexing of hash values (keys) for the corresponding data/items. We previously proposed the Dynamic Locality-Sensitive Hashing (DLSH) algorithm with a dynamically structured hash table, optimized for storage in the main memory and General-Purpose computation on Graphics Processing Units (GPGPU) memory. This supports the handling of constantly updated data sets, such as songs, images, or text databases. The DLSH algorithm works effectively with data sets that are updated with high frequency and is compatible with parallel processing. However, the use of a single GPGPU device for processing big data is inadequate, due to the small memory capacity of GPGPU devices. When using multiple GPGPU devices for searching, we need an effective search algorithm to balance the jobs. In this paper, we propose an extension of DLSH for big data sets using multiple GPGPUs, in order to increase the capacity and performance of the information retrieval system. Different search strategies on multiple DLSH clusters are also proposed to adapt our parallelized system. With significant results in terms of performance and accuracy, we show that DLSH can be applied to real-life dynamic database systems.
引用
收藏
页数:18
相关论文
共 22 条
  • [1] Andoni A, 2006, ANN IEEE SYMP FOUND, P459
  • [2] [Anonymous], SCALABLE DYNAMIC LOC
  • [3] [Anonymous], NVIDIA CUDA C PROGR
  • [4] [Anonymous], ACC SUP AI ER
  • [5] [Anonymous], 2011, FDN LARGE SCALE MULT
  • [6] [Anonymous], 2012, P JOINT C HOK CHAPT
  • [7] [Anonymous], P INT WORKSH NONL CI
  • [8] Barker B., 2015, WORKSH HIGH PERF COM, V262
  • [9] Social big data: Recent achievements and new challenges
    Bello-Orgaz, Gema
    Jung, Jason J.
    Camacho, David
    [J]. INFORMATION FUSION, 2016, 28 : 45 - 59
  • [10] Blanas Spyros, 2011, P ACM SIGMOD INT C M, P37, DOI [10.1145/1989323.1989328Greece, DOI 10.1145/1989323.1989328GREECE, 10.1145/1989323.1989328, DOI 10.1145/1989323.1989328]