GPU-based Private Information Retrieval for On-Device Machine Learning Inference

被引:0
作者
Lam, Maximilian [2 ]
Johnson, Jeff [1 ]
Xiong, Wenjie [3 ]
Maeng, Kiwan [4 ]
Gupta, Udit [6 ]
Li, Yang [1 ]
Lai, Liangzhen [1 ]
Leontiadis, Ilias [1 ]
Rhu, Minsoo [1 ]
Lee, Hsien-Hsin S. [5 ]
Reddi, Vijay Janapa [2 ]
Wei, Gu-Yeon [2 ]
Brooks, David [2 ]
Suh, G. Edward [1 ,6 ]
机构
[1] Meta AI, Menlo Pk, CA 94025 USA
[2] Harvard Univ, Cambridge, MA 02138 USA
[3] Virginia Tech, Blacksburg, VA USA
[4] Penn State, University Pk, PA USA
[5] Intel, Santa Clara, CA USA
[6] Cornell Univ, Ithaca, NY USA
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 1 | 2024年
关键词
privacy; security; cryptography; machine learning; GPU; performance; PROTECTION;
D O I
10.1145/3617232.3624855
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To over-come this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than 20x over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over 5x additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to 100, 000 queries per second a > 100x throughput improvement over a CPU-based baseline-while maintaining model accuracy.
引用
收藏
页码:197 / 214
页数:18
相关论文
共 50 条
  • [21] Accelerated Machine Learning for On-Device Hardware-Assisted Cybersecurity in Edge Platforms
    Makrani, Hosein Mohammadi
    He, Zhangying
    Rafatirad, Setareh
    Sayadi, Hossein
    PROCEEDINGS OF THE TWENTY THIRD INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2022), 2022, : 77 - 83
  • [22] GPU-based MapReduce for large-scale near-duplicate video retrieval
    Hanli Wang
    Fengkuangtian Zhu
    Bo Xiao
    Lei Wang
    Yu-Gang Jiang
    Multimedia Tools and Applications, 2015, 74 : 10515 - 10534
  • [23] An Efficient and Accurate GPU-based Deep Learning Model for Multimedia Recommendation
    Djenouri, Youcef
    Belhadi, Asma
    Srivastava, Gautam
    Lin, Jerry Chun-Wei
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (02)
  • [24] Comparison of Deep Learning in Neural Networks on CPU and GPU-based frameworks
    Aida-zade, Kamil
    Mustafayev, Elshan
    Rustamov, Samir
    2017 11TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT 2017), 2017, : 95 - 98
  • [25] GPU-based Stereo Matching Algorithm with the Strategy of Population-based Incremental Learning
    Nie, Dong-Hu
    Han, Kyu-Phil
    Lee, Heng-Suk
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2009, 5 (02): : 105 - 116
  • [26] GPU-Accelerated Machine Learning Inference as a Service for Computing in Neutrino Experiments
    Wang, Michael
    Yang, Tingjun
    Flechas, Maria Acosta
    Harris, Philip
    Hawks, Benjamin
    Holzman, Burt
    Knoepfel, Kyle
    Krupa, Jeffrey
    Pedro, Kevin
    Tran, Nhan
    FRONTIERS IN BIG DATA, 2021, 3
  • [27] A novel image model for vehicle classification in restricted areas using on-device machine learning
    Lamba A.
    Kumar V.
    International Journal of Information Technology, 2023, 15 (6) : 3037 - 3043
  • [28] On-Device Machine Learning for Diagnosis of Parkinson's Disease from Hand Drawn Artifacts
    Venkata, Sai Vaibhav Polisetti
    Sabat, Shubhankar
    Deshpande, Chinmay Anand
    Arefeen, Asiful
    Peterson, Daniel
    Ghasemzadeh, Hassan
    2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI'22) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22), 2022,
  • [29] Applying machine learning to text segmentation for information retrieval
    Huang, XJ
    Peng, FC
    Schuurmans, D
    Cercone, N
    Robertson, SE
    INFORMATION RETRIEVAL, 2003, 6 (3-4): : 333 - 362
  • [30] On Machine Learning and Knowledge Organization in Multimedia Information Retrieval
    Macfarlane, Andrew
    Missaoui, Sondess
    Frankowska-Takhari, Sylwia
    KNOWLEDGE ORGANIZATION, 2020, 47 (01): : 45 - 55