GPU-based Private Information Retrieval for On-Device Machine Learning Inference

被引：2

作者：

Lam, Maximilian ^{[2
]}

Johnson, Jeff ^{[1
]}

Xiong, Wenjie ^{[3
]}

Maeng, Kiwan ^{[4
]}

Gupta, Udit ^{[6
]}

Li, Yang ^{[1
]}

Lai, Liangzhen ^{[1
]}

Leontiadis, Ilias ^{[1
]}

Rhu, Minsoo ^{[1
]}

Lee, Hsien-Hsin S. ^{[5
]}

Reddi, Vijay Janapa ^{[2
]}

Wei, Gu-Yeon ^{[2
]}

Brooks, David ^{[2
]}

Suh, G. Edward ^{[1
,6
]}

机构：

[1] Meta AI, Menlo Pk, CA 94025 USA

[2] Harvard Univ, Cambridge, MA 02138 USA

[3] Virginia Tech, Blacksburg, VA USA

[4] Penn State, University Pk, PA USA

[5] Intel, Santa Clara, CA USA

[6] Cornell Univ, Ithaca, NY USA

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 1 | 2024年

关键词：

privacy; security; cryptography; machine learning; GPU; performance; ARCHITECTURE; PROTECTION;

D O I：

10.1145/3617232.3624855

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To over-come this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than 20x over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over 5x additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to 100, 000 queries per second a > 100x throughput improvement over a CPU-based baseline-while maintaining model accuracy.

引用

页码：197 / 214

页数：18

共 50 条

[31] A novel image model for vehicle classification in restricted areas using on-device machine learning [J].

Lamba A. ;

Kumar V. .

International Journal of Information Technology, 2023, 15 (6) :3037-3043

[32] On-Device Machine Learning for Diagnosis of Parkinson's Disease from Hand Drawn Artifacts [J].

Venkata, Sai Vaibhav Polisetti ;

Sabat, Shubhankar ;

Deshpande, Chinmay Anand ;

Arefeen, Asiful ;

Peterson, Daniel ;

Ghasemzadeh, Hassan .

2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI'22) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22), 2022,

[33] Applying machine learning to text segmentation for information retrieval [J].

Huang, XJ ;

Peng, FC ;

Schuurmans, D ;

Cercone, N ;

Robertson, SE .

INFORMATION RETRIEVAL, 2003, 6 (3-4) :333-362

[34] On Machine Learning and Knowledge Organization in Multimedia Information Retrieval [J].

Macfarlane, Andrew ;

Missaoui, Sondess ;

Frankowska-Takhari, Sylwia .

KNOWLEDGE ORGANIZATION, 2020, 47 (01) :45-55

[35] Applying Machine Learning to Text Segmentation for Information Retrieval [J].

Xiangji Huang ;

Fuchun Peng ;

Dale Schuurmans ;

Nick Cercone ;

Stephen E. Robertson .

Information Retrieval, 2003, 6 :333-362

[36] Information Retrieval Ranking Using Machine Learning Techniques [J].

Pandey, Shweta ;

Mathur, Iti ;

Joshi, Nisheeth .

PROCEEDINGS 2019 AMITY INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AICAI), 2019, :86-92

[37] A Comparative Study of Information Retrieval Using Machine Learning [J].

Solanki, Surabhi ;

Verma, Seema ;

Chahar, Kishore .

ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, :35-42

[38] A Hybrid GPU-FPGA-based Computing Platform for Machine Learning [J].

Liu, Xu ;

Ounifi, Hibat Allah ;

Gherbi, Abdelouahed ;

Lemieux, Yves ;

Li, Wubin .

9TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN-2018) / 8TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2018), 2018, 141 :104-111

[39] Accelerating GPU-based Machine Learning in Python']Python using MPI Library: A Case Study with MVAPICH2-GDR [J].

Ghazimirsaeed, S. Mahdieh ;

Anthony, Quentin ;

Shafi, Aamir ;

Subramoni, Hari ;

Panda, Dhabaleswar K. Dk .

2020 IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2020) AND WORKSHOP ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR SCIENTIFIC APPLICATIONS (AI4S 2020), 2020, :17-28

[40] Tsetlin Machine-Based Image Classification FPGA Accelerator With On-Device Training [J].

Tunheim, Svein Anders ;

Jiao, Lei ;

Shafik, Rishad ;

Yakovlev, Alex ;

Granmo, Ole-Christoffer .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2025, 72 (02) :830-843

← 1 2 3 4 5 →