LSH-based distributed similarity indexing with load balancing in high-dimensional space

被引:10
|
作者
Wu, Jiagao [1 ,2 ]
Shen, Lu [1 ,2 ]
Liu, Linfeng [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp, POB 843, Nanjing 210023, Peoples R China
[2] Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Locality-sensitive hashing; Similarity search; P2P networks; Load balancing; High-dimensional space; EFFICIENT; SEARCH;
D O I
10.1007/s11227-019-03047-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous l2\documentclass[12pt]{minimal}earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.
引用
收藏
页码:636 / 665
页数:30
相关论文
共 50 条
  • [1] LSH-based distributed similarity indexing with load balancing in high-dimensional space
    Jiagao Wu
    Lu Shen
    Linfeng Liu
    The Journal of Supercomputing, 2020, 76 : 636 - 665
  • [2] A fast LSH-based similarity search method for multivariate time series
    Yu, Chenyun
    Luo, Lintong
    Chan, Leanne Lai-Hang
    Rakthanmanon, Thanawin
    Nutanong, Sarana
    INFORMATION SCIENCES, 2019, 476 : 337 - 356
  • [3] Data Independent Method of Constructing Distributed LSH for Large-Scale Dynamic High-Dimensional Indexing
    Gu, Xiaoguang
    Zhang, Lei
    Zhang, Dongming
    Zhang, Yongdong
    Li, Jintao
    Bao, Ning
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 564 - 571
  • [4] A Generic Method for Accelerating LSH-Based Similarity Join Processing
    Yu, Chenyun
    Nutanong, Sarana
    Li, Hangyu
    Wang, Cong
    Yuan, Xingliang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (04) : 712 - 726
  • [5] An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors
    Feng Xiaokang
    Cui Jiangtao
    Li Hui
    Liu Yingfan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24407 - 24429
  • [6] An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors
    Feng Xiaokang
    Cui Jiangtao
    Li Hui
    Liu Yingfan
    Multimedia Tools and Applications, 2019, 78 : 24407 - 24429
  • [7] NetSHa: In-Network Acceleration of LSH-Based Distributed Search
    Zhang, Penghao
    Pan, Heng
    Li, Zhenyu
    Cui, Penglai
    Jia, Ru
    He, Peng
    Zhang, Zhibin
    Tyson, Gareth
    Xie, Gaogang
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (09) : 2213 - 2229
  • [8] A cell-based high-dimensional indexing scheme for similarity search in multimedia databases
    Chang, JW
    Kim, YC
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS: INFORMATION SYSTEMS DEVELOPMENT I, 2002, : 51 - 56
  • [9] Indexing high-dimensional data for main-memory similarity search
    Yu, Xiaohui
    Doug, Junfeng
    INFORMATION SYSTEMS, 2010, 35 (07) : 825 - 843
  • [10] A Hierarchical Bitmap Indexing Method for Similarity Search in High-Dimensional Multimedia Databases
    Nang, Jongho
    Park, Joohyoun
    Yang, Jihoon
    Kim, Saejoon
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2010, 26 (02) : 393 - 407