Fair Near Neighbor Search: Independent Range Sampling in High Dimensions

被引:0
|
作者
Aumuller, Martin [1 ]
Pagh, Rasmus [1 ,2 ]
Silvestri, Francesco [3 ]
机构
[1] IT Univ Copenhagen, Copenhagen, Denmark
[2] BARC, Copenhagen, Denmark
[3] Univ Padua, Padua, Italy
来源
PODS'20: PROCEEDINGS OF THE 39TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS | 2020年
关键词
Similarity search; Locality Sensitive Hashing; Fairness; Sampling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the r -near neighbor (r-NN) problem: given a radius r > 0 and a set of points S, construct a data structure that, for any given query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for r-NN where all points in S that are near q have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights (un)fairness in a recommendation setting on real-world datasets and discusses the inherent unfairness introduced by solving other variants of the problem.
引用
收藏
页码:191 / 204
页数:14
相关论文
共 50 条
  • [1] Fair near neighbor search via sampling
    Aumuller, Martin
    Har-Peled, Sariel
    Mahabadi, Sepideh
    Pagh, Rasmus
    Silvestri, Francesco
    SIGMOD RECORD, 2021, 50 (01) : 42 - 49
  • [2] Technical Perspective: Fair Near Neighbor Search via Sampling
    Zhang, Qin
    SIGMOD RECORD, 2021, 50 (01) : 41 - 41
  • [3] Sampling a Near Neighbor in High Dimensions - Who is the Fairest of Them All?
    Aumuller, Martin
    Har-Peled, Sariel
    Mahabadi, Sepideh
    Pagh, Rasmus
    Silvestri, Francesco
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2022, 47 (01):
  • [4] A simple algorithm for nearest neighbor search in high dimensions
    Nene, SA
    Nayar, SK
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (09) : 989 - 1003
  • [5] Entropy based Nearest Neighbor Search in High Dimensions
    Panigrahy, Rina
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 1186 - 1195
  • [6] Randomized Algorithm for Approximate Nearest Neighbor Search in High Dimensions
    Buabal, Ruben
    Homaifarl, Abdollah
    Hendrix, William
    Son, Seung Woo
    Liao, Wei-keng
    Choudhary, Alok
    JOURNAL OF PATTERN RECOGNITION RESEARCH, 2014, 9 (01): : 111 - 122
  • [7] PARALLEL ALGORITHMS FOR NEAREST NEIGHBOR SEARCH PROBLEMS IN HIGH DIMENSIONS
    Xiao, Bo
    Biros, George
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (05): : S667 - S699
  • [8] Differentially Private Approximate Near Neighbor Counting in High Dimensions
    Andoni, Alexandr
    Indyk, Piotr
    Mahabadi, Sepideh
    Narayanan, Shyam
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Fast Approximate Near Neighbor Algorithm by Clustering in High Dimensions
    Hung Tran-The
    Vinh Nguyen Van
    Minh Hoang Anh
    2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 274 - 279
  • [10] Probably correct k-nearest neighbor search in high dimensions
    Toyama, Jun
    Kudo, Mineichi
    Imai, Hideyuki
    PATTERN RECOGNITION, 2010, 43 (04) : 1361 - 1372