Multi-Resolution Hashing for Fast Pairwise Summations

被引:1
作者
Charikar, Moses [1 ]
Siminelakis, Paris [2 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
来源
2019 IEEE 60TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2019) | 2019年
关键词
Hashing; Kernel Density; Partition Function; Importance Sampling; Sub-linear algorithms; APPROXIMATE NEAREST-NEIGHBOR; ALGORITHM; GENERATION; COMPLEXITY; BOUNDS;
D O I
10.1109/FOCS.2019.00051
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a pairwise function w(x,y), we study the problem of designing a data-structure that enables sublinear-time approximation of the summation of w(x,y) for all x in X for any query point y. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is constructing a collection of hash families, each inducing a different collision probability between points in the dataset, such that the pointwise supremum of the collision probabilities scales as the square root of the function w(x,y). This leads to a data-structure that approximates pairwise summations using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for log-convex functions of the inner product between two vectors. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density, Partition Function Estimation and sampling.
引用
收藏
页码:769 / 792
页数:24
相关论文
共 70 条
  • [1] Ahle TD, 2017, PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P239
  • [2] Allen-Zhu Z., 2016, ADV NEURAL INFORM PR, P1642
  • [3] Allen-Zhu ZY, 2016, PR MACH LEARN RES, V48
  • [4] Allen-Zhu Z, 2016, PR MACH LEARN RES, V48
  • [5] Katyusha: The First Direct Acceleration of Stochastic Gradient Methods
    Allen-Zhu, Zeyuan
    [J]. STOC'17: PROCEEDINGS OF THE 49TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2017, : 1200 - 1205
  • [6] Alon N., 1996, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, P20, DOI 10.1145/237814.237823
  • [7] Andoni A, 2006, ANN IEEE SYMP FOUND, P459
  • [8] Andoni A, 2015, ADV NEUR IN, V28
  • [9] Andoni A, 2017, PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P67
  • [10] Andoni A, 2017, PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P47