Private Web Search with Tiptoe

被引:8
作者
Henzinger, Alexandra [1 ]
Dauterman, Emma [2 ]
Corrigan-Gibbs, Henry [1 ]
Zeldovich, Nickolai [1 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
PROCEEDINGS OF THE TWENTY-NINTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, SOSP 2023 | 2023年
基金
美国国家科学基金会;
关键词
INFORMATION-RETRIEVAL; DATABASE;
D O I
10.1145/3600006.3613134
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Tiptoe is a private web search engine that allows clients to search over hundreds of millions of documents, while revealing no information about their search query to the search engine's servers. Tiptoe's privacy guarantee is based on cryptography alone; it does not require hardware enclaves or non-colluding servers. Tiptoe uses semantic embeddings to reduce the problem of private full-text search to private nearest-neighbor search. Then, Tiptoe implements private nearest-neighbor search with a new, high-throughput protocol based on linearly homomorphic encryption. Running on a 45-server cluster, Tiptoe can privately search over 360 million web pages with 145 core-seconds of server compute, 56.9 MiB of client-server communication (74% of which occurs before the client enters its search query), and 2.7 seconds of end-to-end latency. Tiptoe's search works best on conceptual queries ("knee pain") and less well on exact string matches ("123 Main Street, New York"). On the MS MARCO search-quality benchmark, Tiptoe ranks the best-matching result in position 7.7 on average. This is worse than a state-of-the-art, nonprivate neural search algorithm (average rank: 2.3), but is close to the classical tf-idf algorithm (average rank: 6.7). Finally, Tiptoe is extensible: it also supports private text-to-image search and, with minor modifications, it can search over audio, code, and more.
引用
收藏
页码:396 / +
页数:30
相关论文
共 137 条
  • [1] Pantheon: Private Retrieval from Public Key-Value Store
    Ahmad, Ishtiyaque
    Agrawal, Divyakant
    El Abbadi, Amr
    Gupta, Trinabh
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 16 (04): : 643 - 656
  • [2] Coeus: A System for Oblivious Document Ranking and Retrieval
    Ahmad, Ishtiyaque
    Sarker, Laboni
    Agrawal, Divyakant
    El Abbadi, Amr
    Gupta, Trinabh
    [J]. PROCEEDINGS OF THE 28TH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, SOSP 2021, 2021, : 672 - 690
  • [3] On the concrete hardness of Learning with Errors
    Albrecht, Martin R.
    Player, Rachel
    Scott, Sam
    [J]. JOURNAL OF MATHEMATICAL CRYPTOLOGY, 2015, 9 (03) : 169 - 203
  • [4] Ali A, 2021, PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, P1811
  • [5] PIR with compressed queries and amortized query processing
    Angel, Sebastian
    Chen, Hao
    Laine, Kim
    Setty, Srinath
    [J]. 2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2018, : 962 - 979
  • [6] Angel S, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P551
  • [7] [Anonymous], 2023, MICROSOFT SEAL RELEA
  • [8] Apache Software Foundation, 2023, Lucene
  • [9] Versatile Query Scrambling for Private Web Search
    Arampatzis, Avi
    Drosatos, George
    Efraimidis, Pavlos S.
    [J]. INFORMATION RETRIEVAL JOURNAL, 2015, 18 (04): : 331 - 358
  • [10] A Privacy-Preserving Similarity Search Scheme over Encrypted Word Embeddings
    Aritomo, Daisuke
    Watanabe, Chiemi
    Matsubara, Masaki
    Morishima, Atsuyuki
    [J]. IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 403 - 412