LSH-based distributed similarity indexing with load balancing in high-dimensional space

被引:10
|
作者
Wu, Jiagao [1 ,2 ]
Shen, Lu [1 ,2 ]
Liu, Linfeng [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp, POB 843, Nanjing 210023, Peoples R China
[2] Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Locality-sensitive hashing; Similarity search; P2P networks; Load balancing; High-dimensional space; EFFICIENT; SEARCH;
D O I
10.1007/s11227-019-03047-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous l2\documentclass[12pt]{minimal}earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.
引用
收藏
页码:636 / 665
页数:30
相关论文
共 50 条
  • [31] Clustering for approximate similarity search in high-dimensional spaces
    Li, C
    Chang, E
    Garcia-Molina, H
    Wiederhold, G
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (04) : 792 - 808
  • [32] Distributed load balancing frequent colossal closed itemset mining algorithm for high dimensional dataset
    Vanahalli, Manjunath K.
    Patil, Nagamma
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 144 : 136 - 152
  • [33] Epsilon grid order:: An algorithm for the similarity join on massive high-dimensional data
    Böhm, C
    Braunmüller, B
    Krebs, F
    Kriege, HP
    SIGMOD RECORD, 2001, 30 (02) : 379 - 388
  • [34] High-dimensional indexing technologies for large scale content-based image retrieval: a review
    Ai, Lie-fu
    Yu, Jun-qing
    He, Yun-feng
    Guan, Tao
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2013, 14 (07): : 505 - 520
  • [35] An efficient high-dimensional indexing method for content-based retrieval in large image databases
    Daoudi, I.
    Idrissi, K.
    Ouatik, S. E.
    Baskurt, A.
    Aboutajdine, D.
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2009, 24 (10) : 775 - 790
  • [36] Load balancing based on similarity multi-paths routing
    Xu, WP
    Yan, PL
    Xia, DL
    Wu, M
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, 2005, 3758 : 345 - 357
  • [37] Action Recognition in a High-Dimensional Feature Space
    Adiguzel, Hande
    Erdem, Hayrettin
    Ferhatosmanoglu, Hakan
    Duygulu, Pinar
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [38] A Load Balancing Model based on Load-aware for Distributed Controllers
    Shang, Fengjun
    Gong, Wenjuan
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS AND COMPUTING TECHNOLOGY, 2016, 60 : 240 - 244
  • [39] REPORTING NEIGHBORS IN HIGH-DIMENSIONAL EUCLIDEAN SPACE
    Aiger, Dror
    Kaplan, Haim
    Sharir, Micha
    SIAM JOURNAL ON COMPUTING, 2014, 43 (04) : 1363 - 1395
  • [40] A fast and scalable similarity search in high-dimensional image datasets
    Hanyf Y.
    Silkan H.
    International Journal of Computer Applications in Technology, 2019, 59 (01): : 95 - 104