Privacy-Preserving Machine Learning on Apache Spark

被引:1
作者
Brito, Claudia V. [1 ,2 ]
Ferreira, Pedro G. [1 ,3 ]
Portela, Bernardo L. [1 ,3 ]
Oliveira, Rui C. [1 ,2 ]
Paulo, Joao T. [1 ,2 ]
机构
[1] INESC TEC, P-4200465 Porto, Portugal
[2] Univ Minho, Dept Informat, P-4710057 Braga, Portugal
[3] Univ Porto, Fac Sci, P-4099002 Porto, Portugal
关键词
Cluster computing; Training; Machine learning; Hardware; Task analysis; Homomorphic encryption; Distributed computing; Trusted computing; Privacy-preserving; machine learning; distributed systems; apache spark; trusted execution environments; Intel SGX; SECURITY; ATTACKS;
D O I
10.1109/ACCESS.2023.3332222
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.
引用
收藏
页码:127907 / 127930
页数:24
相关论文
共 77 条
[1]   Deep Learning with Differential Privacy [J].
Abadi, Martin ;
Chu, Andy ;
Goodfellow, Ian ;
McMahan, H. Brendan ;
Mironov, Ilya ;
Talwar, Kunal ;
Zhang, Li .
CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318
[2]   Privacy-Preserving Machine Learning: Threats and Solutions [J].
Al-Rubaie, Mohammad ;
Chang, J. Morris .
IEEE SECURITY & PRIVACY, 2019, 17 (02) :49-58
[3]  
Alves T., 2004, TrustZone: Integrated hardware and software security
[4]  
[Anonymous], Apache Teaclave
[5]  
Arnautov S, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P689
[6]  
Azure M., Azure Confidential Computing
[7]   Secure Multiparty Computation from SGX [J].
Bahmani, Raad ;
Barbosa, Manuel ;
Brasser, Ferdinand ;
Portela, Bernardo ;
Sadeghi, Ahmad-Reza ;
Scerri, Guillaume ;
Warinschi, Bogdan .
FINANCIAL CRYPTOGRAPHY AND DATA SECURITY, FC 2017, 2017, 10322 :477-497
[8]   Machine Learning Classification over Encrypted Data [J].
Bost, Raphael ;
Popa, Raluca Ada ;
Tu, Stephen ;
Goldwasser, Shafi .
22ND ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2015), 2015,
[9]  
Brendel W., 2018, P 6 INT C LEARN REPR, P77
[10]   SecureKeeper: Confidential ZooKeeper using Intel SGX [J].
Brenner, Stefan ;
Wulf, Colin ;
Goltzsche, David ;
Weichbrodt, Nico ;
Lorenz, Matthias ;
Fetzer, Christof ;
Pietzuch, Peter ;
Kapitza, Rudiger .
MIDDLEWARE '16: PROCEEDINGS OF THE 17TH INTERNATIONAL MIDDLEWARE CONFERENCE, 2016,