Multi-Resolution Hashing for Fast Pairwise Summations

被引:1
|
作者
Charikar, Moses [1 ]
Siminelakis, Paris [2 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
Hashing; Kernel Density; Partition Function; Importance Sampling; Sub-linear algorithms; APPROXIMATE NEAREST-NEIGHBOR; ALGORITHM; GENERATION; COMPLEXITY; BOUNDS;
D O I
10.1109/FOCS.2019.00051
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a pairwise function w(x,y), we study the problem of designing a data-structure that enables sublinear-time approximation of the summation of w(x,y) for all x in X for any query point y. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is constructing a collection of hash families, each inducing a different collision probability between points in the dataset, such that the pointwise supremum of the collision probabilities scales as the square root of the function w(x,y). This leads to a data-structure that approximates pairwise summations using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for log-convex functions of the inner product between two vectors. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density, Partition Function Estimation and sampling.
引用
收藏
页码:769 / 792
页数:24
相关论文
共 50 条
  • [1] Multi-resolution similarity hashing
    Roussev, Vassil
    Richard, Golden G., III
    Marziale, Lodovico
    DIGITAL INVESTIGATION, 2007, 4 (SUPPL.) : S105 - S113
  • [2] Multi-resolution sketches and locality sensitive hashing for fast trajectory processing
    Astefanoaei, Maria
    Cesaretti, Paul
    Katsikouli, Panagiota
    Goswami, Mayank
    Sarkar, Rik
    26TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2018), 2018, : 279 - 288
  • [3] Fast multi-resolution volume rendering
    Yang, YT
    Lin, F
    Seah, HS
    VOLUME GRAPHICS, 2000, : 185 - +
  • [4] A fast multi-resolution approach to tomographic PIV
    Stefano Discetti
    Tommaso Astarita
    Experiments in Fluids, 2012, 52 : 765 - 777
  • [5] Fast extraction of multi-resolution Gabor features
    Ilonen, Jarmo
    Kamarainen, Joni-Kristian
    Kalviainen, Heikki
    14TH INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND PROCESSING, PROCEEDINGS, 2007, : 481 - +
  • [6] A fast multi-resolution approach to tomographic PIV
    Discetti, Stefano
    Astarita, Tommaso
    EXPERIMENTS IN FLUIDS, 2012, 52 (03) : 765 - 777
  • [7] Fast image replacement using multi-resolution approach
    Fang, CW
    Lien, JJJ
    COMPUTER VISION - ACCV 2006, PT II, 2006, 3852 : 509 - 520
  • [8] Fast multi-resolution colorization of high-resolution gray images
    Hu, Wei
    Qin, Kai-Huai
    Jisuanji Xuebao/Chinese Journal of Computers, 2009, 32 (05): : 1062 - 1068
  • [9] Fast image segmentation based on multi-resolution analysis and wavelets
    Kim, BG
    Shim, JI
    Park, DJ
    PATTERN RECOGNITION LETTERS, 2003, 24 (16) : 2995 - 3006
  • [10] Fast Image Restoration Method Based on the Multi-Resolution Layer
    Hsieh, Ching-Tang
    Chen, Yen-Liang
    Hsu, Chih-Hsu
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2009, 12 (04): : 439 - 448