Compressing Locality Sensitive Hashing Tables

被引:1
作者
Santoyo, Francisco [1 ]
Chavez, Edgar [3 ]
Tellez, Eric S. [2 ]
机构
[1] Univ Michoacana San Nicol de Hidalgo, Fac Ingn Elect, Div Estudios Posgrad, Morelia, Michoacan, Mexico
[2] Univ Michoacana San Nicol de Hidalgo, Fac Fisico Math, Morelia, Michoacan, Mexico
[3] Univ Nacl Autonoma Mexico, Math Inst, Mexico City 04510, DF, Mexico
来源
2013 MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE (ENC 2013) | 2013年
关键词
Locality Sensitive Hashing; Succinct Proximity Search Indexes;
D O I
10.1109/ENC.2013.12
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
LSH is the industry standard for proximity searching tasks on collections of data having coordinates. An LSH index applies a set of hashing functions to the representation of an object to identify proximal objects to a query, leaving distal objects apart. In other words, objects with the same hash will be mutually proximal with high probability. LSH is very fast and gives probabilistic guarantees on the quality of the results. On the other hand, mobile applications using proximity queries are becoming common place. Feature extraction can be done in a smart phone. However, the actual query rely on a wireless link because memory is a scarce resource. To tackle the above problem, we present in this paper a method to compress the LSH index while still being able to query without decompressing. The query speed is practically the same, and can even be faster. We derive a lower bound on the memory requirements for the compress representation and present an implementation using close to optimal storage. We provide an extensive experimental comparison of our compressed representation against the un-compressed one over a large database of 55 million objects. We obtained a compression ratio ranging from 70% to 80% without slowing down, in practice, the search speed.
引用
收藏
页码:41 / 46
页数:6
相关论文
共 9 条
  • [1] Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
    Andoni, Alexandr
    Indyk, Piotr
    [J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 117 - 122
  • [2] [Anonymous], 2004, HDB DISCRETE COMPUTA
  • [3] Searching in metric spaces
    Chávez, E
    Navarro, G
    BaezaYates, R
    Marroquín, JL
    [J]. ACM COMPUTING SURVEYS, 2001, 33 (03) : 273 - 321
  • [4] Claude F, 2008, LECT NOTES COMPUT SC, V5280, P176
  • [5] Gionis A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P518
  • [6] Rank/Select Operations on Large Alphabets: a Tool for Text Indexing
    Golynski, Alexander
    Munro, J. Ian
    Rao, S. Srinivasa
    [J]. PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 368 - 373
  • [7] Gonzalez R, 2005, TRIQUARTERLY, P27
  • [8] A robust entropy-based Audio-Fingerprint
    Ibarrola, Antonio C.
    Chavez, Edgar
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1729 - +
  • [9] Tellez E. S., 2012, THESIS