Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket

被引:0
作者
Fang, Bo [1 ]
Hua, Zhongyun [1 ]
Huang, Hejiao [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen, Peoples R China
来源
14TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2019) | 2019年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Locality Sensitive Hashing; Nearest Neighbor Search; Heapsort;
D O I
10.1109/iccse.2019.8845438
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Nearest neighbor search (NNS) is one of the current popular research directions, which widely used in machine learning, pattern recognition, image detection and so on. In the low dimension data, based on tree search method can get. good results. But when the data dimension goes up, that will produce a curse of dimensional. The proposed Locality-Sensitive Hashing algorithm (LSH) greatly improves the efficiency of nearest neighbor query for high dimensional data. But the algorithm relies on the building a large number of hash table, which makes the space complexity very high. C2LSH based on dynamic collision improves the disadvantage of LSH, but its disadvantage is that it needs to detect the collision times of a large number of data points which Increased query time. Therefore, Based on LSH algorithm, later researchers put forward many improved algorithms, hut still not ideal. In this paper, we put forward Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket (HSLSH) algorithm aiming at the shortcomings of LSH and C2LSH. Its main idea is to take advantage of the efficiency of heapsort in massive data sorting to improve the efficiency of nearest neighbor query. It only needs to rely on a small number of hash functions can not only overcome the shortcoming of LSH need to build a large number of hash table, and avoids defects of C2LSH. Experiments show that our algorithm is more than 20% better than C2LSH in query accuracy and 40% percent lower in query time.
引用
收藏
页码:5 / 10
页数:6
相关论文
共 14 条
  • [1] Bentley J. L., 1990, S COMPUTATIONA UNPUB
  • [2] Datar M., 2004, P ACM S COMP GEOM
  • [3] Fu HL, 2010, 2010 INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT (CCCM2010), VOL III, P35
  • [4] Gan J., 2012, P ACM SIGMOD INT C M
  • [5] Guttman A., 1984, SIGMOD Record, V14, P47, DOI 10.1145/971697.602266
  • [6] Indyk P., 1998, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, P604, DOI 10.1145/276698.276876
  • [7] Joly A, 2008, P 16 ACM INT C MULT, P209
  • [8] Katayama N, 1997, SIGMOD
  • [9] Lu Ying-Hua, 2014, IMPROVED LOCALITY SE
  • [10] Lv Q., 2007, VLDB