Privacy-Preserving Machine Learning on Apache Spark

被引：1

作者：

Brito, Claudia V. ^{[1
,2
]}

Ferreira, Pedro G. ^{[1
,3
]}

Portela, Bernardo L. ^{[1
,3
]}

Oliveira, Rui C. ^{[1
,2
]}

Paulo, Joao T. ^{[1
,2
]}

机构：

[1] INESC TEC, P-4200465 Porto, Portugal

[2] Univ Minho, Dept Informat, P-4710057 Braga, Portugal

[3] Univ Porto, Fac Sci, P-4099002 Porto, Portugal

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Cluster computing; Training; Machine learning; Hardware; Task analysis; Homomorphic encryption; Distributed computing; Trusted computing; Privacy-preserving; machine learning; distributed systems; apache spark; trusted execution environments; Intel SGX; SECURITY; ATTACKS;

D O I：

10.1109/ACCESS.2023.3332222

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.

引用

页码：127907 / 127930

页数：24

共 77 条

[1] Deep Learning with Differential Privacy [J].

Abadi, Martin ;

Chu, Andy ;

Goodfellow, Ian ;

McMahan, H. Brendan ;

Mironov, Ilya ;

Talwar, Kunal ;

Zhang, Li .

CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318

[2] Privacy-Preserving Machine Learning: Threats and Solutions [J].

Al-Rubaie, Mohammad ;

Chang, J. Morris .

IEEE SECURITY & PRIVACY, 2019, 17 (02) :49-58

[3]

Alves T., 2004, TrustZone: Integrated hardware and software security

[4]

[Anonymous], Apache Teaclave

[5]

Arnautov S, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P689

[6]

Azure M., Azure Confidential Computing

[7] Secure Multiparty Computation from SGX [J].

Bahmani, Raad ;

Barbosa, Manuel ;

Brasser, Ferdinand ;

Portela, Bernardo ;

Sadeghi, Ahmad-Reza ;

Scerri, Guillaume ;

Warinschi, Bogdan .

FINANCIAL CRYPTOGRAPHY AND DATA SECURITY, FC 2017, 2017, 10322 :477-497

[8] Machine Learning Classification over Encrypted Data [J].

Bost, Raphael ;

Popa, Raluca Ada ;

Tu, Stephen ;

Goldwasser, Shafi .

22ND ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2015), 2015,

[9]

Brendel W., 2018, P 6 INT C LEARN REPR, P77

[10] SecureKeeper: Confidential ZooKeeper using Intel SGX [J].

Brenner, Stefan ;

Wulf, Colin ;

Goltzsche, David ;

Weichbrodt, Nico ;

Lorenz, Matthias ;

Fetzer, Christof ;

Pietzuch, Peter ;

Kapitza, Rudiger .

MIDDLEWARE '16: PROCEEDINGS OF THE 17TH INTERNATIONAL MIDDLEWARE CONFERENCE, 2016,

← 1 2 3 4 5 6 7 8 →