Multi-Resolution Hashing for Fast Pairwise Summations

被引：1

作者：

Charikar, Moses ^{[1
]}

Siminelakis, Paris ^{[2
]}

机构：

[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

2019 IEEE 60TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2019) | 2019年

关键词：

Hashing; Kernel Density; Partition Function; Importance Sampling; Sub-linear algorithms; APPROXIMATE NEAREST-NEIGHBOR; ALGORITHM; GENERATION; COMPLEXITY; BOUNDS;

D O I：

10.1109/FOCS.2019.00051

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X and a pairwise function w(x,y), we study the problem of designing a data-structure that enables sublinear-time approximation of the summation of w(x,y) for all x in X for any query point y. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is constructing a collection of hash families, each inducing a different collision probability between points in the dataset, such that the pointwise supremum of the collision probabilities scales as the square root of the function w(x,y). This leads to a data-structure that approximates pairwise summations using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for log-convex functions of the inner product between two vectors. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density, Partition Function Estimation and sampling.

引用

页码：769 / 792

页数：24

共 70 条

[1] Ahle TD, 2017, PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P239
[2] Allen-Zhu Z., 2016, ADV NEURAL INFORM PR, P1642
[3] Allen-Zhu ZY, 2016, PR MACH LEARN RES, V48
[4] Allen-Zhu Z, 2016, PR MACH LEARN RES, V48
[5] Katyusha: The First Direct Acceleration of Stochastic Gradient Methods
Allen-Zhu, Zeyuan
[J]. STOC'17: PROCEEDINGS OF THE 49TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2017, : 1200 - 1205
[6] Alon N., 1996, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, P20, DOI 10.1145/237814.237823
[7] Andoni A, 2006, ANN IEEE SYMP FOUND, P459
[8] Andoni A, 2015, ADV NEUR IN, V28
[9] Andoni A, 2017, PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P67
[10] Andoni A, 2017, PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P47

← 1 2 3 4 5 6 7 →