Smaller, Faster & Lighter KNN Graph Constructions

被引:7
作者
Guerraoui, Rachid [1 ]
Kermarrec, Anne-Marie [1 ,2 ]
Ruas, Olivier [3 ]
Taiani, Francois [4 ]
机构
[1] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[2] Mediego, Cesson Sevigne, France
[3] Peking Univ, Beijing, Peoples R China
[4] Univ Rennes, INRIA, CNRS, IRISA, Rennes, France
来源
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) | 2020年
关键词
D O I
10.1145/3366423.3380184
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose GoldFinger, a new compact and fast-to-compute binary representation of datasets to approximate Jaccard's index. We illustrate the effectiveness of GoldFinger on the emblematic big data problem of K-Nearest-Neighbor (KNN) graph construction and show that GoldFinger can drastically accelerate a large range of existing KNN algorithms with little to no overhead. As a side effect, we also show that the compact representation of the data protects users' privacy for free by providing k-anonymity and /-diversity. Our extensive evaluation of the resulting approach on several realistic datasets shows that our approach delivers speedups of up to 78.9% compared to the use of raw data while only incurring a negligible to moderate loss in terms of KNN quality. To convey the practical value of such a scheme, we apply it to item recommendation and show that the loss in recommendation quality is negligible.
引用
收藏
页码:1060 / 1070
页数:11
相关论文
共 45 条
  • [1] Alaggan M, 2012, SSS
  • [2] Identifying Global Icebergs in Distributed Streams
    Anceaume, Emmanuelle
    Busnel, Yann
    Rivetti, Nicolo
    Sericola, Bruno
    [J]. 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS), 2015, : 266 - 275
  • [3] [Anonymous], 2006, ICML
  • [4] [Anonymous], 2002, STOC
  • [5] [Anonymous], 2002, AUTOMATA LANGUAGES P
  • [6] [Anonymous], 2011, P 20 INT C WORLD WID
  • [7] [Anonymous], 2010, MIDDLEWARE
  • [8] [Anonymous], 2012, CORR
  • [9] [Anonymous], 2006, ICDE
  • [10] [Anonymous], 2012, ICDE