LSH-based distributed similarity indexing with load balancing in high-dimensional space

被引:10
|
作者
Wu, Jiagao [1 ,2 ]
Shen, Lu [1 ,2 ]
Liu, Linfeng [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp, POB 843, Nanjing 210023, Peoples R China
[2] Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Locality-sensitive hashing; Similarity search; P2P networks; Load balancing; High-dimensional space; EFFICIENT; SEARCH;
D O I
10.1007/s11227-019-03047-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous l2\documentclass[12pt]{minimal}earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.
引用
收藏
页码:636 / 665
页数:30
相关论文
共 50 条
  • [21] DISCOVERING ROBUST EMBEDDINGS IN (DIS)SIMILARITY SPACE FOR HIGH-DIMENSIONAL LINGUISTIC FEATURES
    Mu, Tingting
    Miwa, Makoto
    Tsujii, Junichi
    Ananiadou, Sophia
    COMPUTATIONAL INTELLIGENCE, 2014, 30 (02) : 285 - 315
  • [22] qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces
    Jafari, Omid
    Ossorgin, John
    Nagarkar, Parth
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 329 - 333
  • [23] Irregularity in high-dimensional space-filling curves
    Mokbel, Mohamed F.
    Aref, Walid G.
    DISTRIBUTED AND PARALLEL DATABASES, 2011, 29 (03) : 217 - 238
  • [24] Distributed high-dimensional similarity search approach for large-scale wireless sensor networks
    Hu, Haifeng
    He, Jiefang
    Wu, Jianshen
    Wang, Kun
    Zhuang, Wei
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (03):
  • [25] Efficient indexing of high-dimensional data through dimensionality reduction
    Goh, CH
    Lim, A
    Ooi, BC
    Tan, KL
    DATA & KNOWLEDGE ENGINEERING, 2000, 32 (02) : 115 - 130
  • [26] Load Balancing for Partition-based Similarity Search
    Tang, Xun
    Alabduljalil, Maha
    Jin, Xin
    Yang, Tao
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 193 - 202
  • [27] Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space
    Ming Zhang
    Reda Alhajj
    Knowledge and Information Systems, 2010, 22 : 1 - 26
  • [28] The SS+-tree: An improved index structure for similarity searches in a high-dimensional feature space
    Kurniamati, R
    Jin, JS
    Shepherd, JA
    STORAGE AND RETRIEVAL FOR IMAGE AND VIDEO DATABASES V, 1997, 3022 : 110 - 120
  • [29] Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space
    Zhang, Ming
    Alhajj, Reda
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 22 (01) : 1 - 26
  • [30] Neural networks trained with high-dimensional functions approximation data in high-dimensional space
    Zheng, Jian
    Wang, Jianfeng
    Chen, Yanping
    Chen, Shuping
    Chen, Jingjin
    Zhong, Wenlong
    Wu, Wenling
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (02) : 3739 - 3750