LSH-based distributed similarity indexing with load balancing in high-dimensional space

被引:10
|
作者
Wu, Jiagao [1 ,2 ]
Shen, Lu [1 ,2 ]
Liu, Linfeng [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp, POB 843, Nanjing 210023, Peoples R China
[2] Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Locality-sensitive hashing; Similarity search; P2P networks; Load balancing; High-dimensional space; EFFICIENT; SEARCH;
D O I
10.1007/s11227-019-03047-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous l2\documentclass[12pt]{minimal}earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.
引用
收藏
页码:636 / 665
页数:30
相关论文
共 50 条
  • [41] A fast and scalable similarity search in high-dimensional image datasets
    Hanyf, Youssef
    Silkan, Hassan
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2019, 59 (01) : 95 - 104
  • [42] A novel approach for high-dimensional vector similarity join query
    Ma, Youzhong
    Jia, Shijie
    Zhang, Yongxin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (05)
  • [43] Competition-based load balancing for distributed systems
    Abed, Abdul Karim
    Oz, Gurcu
    Kostin, Alexander
    ISCN '06: PROCEEDINGS OF THE 7TH INTERNATIONAL SYMPOSIUM ON COMPUTER NETWORKS, 2006, : 230 - +
  • [44] Outlier Detection based on Sparse Coding and Neighbor Entropy in High-dimensional Space
    Gu, Ping
    Chow, Meng
    Shao, Siyu
    17TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2020 (CF 2020), 2020, : 202 - 207
  • [45] Projection Based Large Scale High-Dimensional Data Similarity Join Using MapReduce Framework
    Ma, Youzhong
    Zhang, Ruiling
    Cui, Zhanyou
    Lin, Chunjie
    IEEE ACCESS, 2020, 8 : 121665 - 121677
  • [46] Weighted subspace anomaly detection in high-dimensional space
    Tu, Jiankai
    Liu, Huan
    Li, Chunguang
    PATTERN RECOGNITION, 2024, 146
  • [47] A study on the critical Kirchhoff problem in high-dimensional space
    Qilin Xie
    Ben-Xing Zhou
    Zeitschrift für angewandte Mathematik und Physik, 2022, 73
  • [48] Synchronization of Kuramoto model in a high-dimensional linear space
    Zhu, Jiandong
    PHYSICS LETTERS A, 2013, 377 (41) : 2939 - 2943
  • [49] Irregularity in high-dimensional space-filling curves
    Mohamed F. Mokbel
    Walid G. Aref
    Distributed and Parallel Databases, 2011, 29 : 217 - 238
  • [50] A study on the critical Kirchhoff problem in high-dimensional space
    Xie, Qilin
    Zhou, Ben-Xing
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND PHYSIK, 2022, 73 (01):