Fair Near Neighbor Search: Independent Range Sampling in High Dimensions

被引：0

作者：

Aumuller, Martin ^{[1
]}

Pagh, Rasmus ^{[1
,2
]}

Silvestri, Francesco ^{[3
]}

机构：

[1] IT Univ Copenhagen, Copenhagen, Denmark

[2] BARC, Copenhagen, Denmark

[3] Univ Padua, Padua, Italy

来源：

PODS'20: PROCEEDINGS OF THE 39TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS | 2020年

关键词：

Similarity search; Locality Sensitive Hashing; Fairness; Sampling;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the r -near neighbor (r-NN) problem: given a radius r > 0 and a set of points S, construct a data structure that, for any given query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for r-NN where all points in S that are near q have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights (un)fairness in a recommendation setting on real-world datasets and discusses the inherent unfairness introduced by solving other variants of the problem.

引用

页码：191 / 204

页数：14

共 50 条

[1] Fair near neighbor search via sampling
Aumuller, Martin
Har-Peled, Sariel
Mahabadi, Sepideh
Pagh, Rasmus
Silvestri, Francesco
SIGMOD RECORD, 2021, 50 (01) : 42 - 49
[2] Technical Perspective: Fair Near Neighbor Search via Sampling
Zhang, Qin
SIGMOD RECORD, 2021, 50 (01) : 41 - 41
[3] Sampling a Near Neighbor in High Dimensions - Who is the Fairest of Them All?
Aumuller, Martin
Har-Peled, Sariel
Mahabadi, Sepideh
Pagh, Rasmus
Silvestri, Francesco
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2022, 47 (01):
[4] A simple algorithm for nearest neighbor search in high dimensions
Nene, SA
Nayar, SK
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (09) : 989 - 1003
[5] Entropy based Nearest Neighbor Search in High Dimensions
Panigrahy, Rina
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 1186 - 1195
[6] Randomized Algorithm for Approximate Nearest Neighbor Search in High Dimensions
Buabal, Ruben
Homaifarl, Abdollah
Hendrix, William
Son, Seung Woo
Liao, Wei-keng
Choudhary, Alok
JOURNAL OF PATTERN RECOGNITION RESEARCH, 2014, 9 (01): : 111 - 122
[7] PARALLEL ALGORITHMS FOR NEAREST NEIGHBOR SEARCH PROBLEMS IN HIGH DIMENSIONS
Xiao, Bo
Biros, George
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (05): : S667 - S699
[8] Differentially Private Approximate Near Neighbor Counting in High Dimensions
Andoni, Alexandr
Indyk, Piotr
Mahabadi, Sepideh
Narayanan, Shyam
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] Fast Approximate Near Neighbor Algorithm by Clustering in High Dimensions
Hung Tran-The
Vinh Nguyen Van
Minh Hoang Anh
2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 274 - 279
[10] Probably correct k-nearest neighbor search in high dimensions
Toyama, Jun
Kudo, Mineichi
Imai, Hideyuki
PATTERN RECOGNITION, 2010, 43 (04) : 1361 - 1372

← 1 2 3 4 5 →