MLPrivacyGuard: Defeating Confidence Information based Model Inversion Attacks on Machine Learning Systems

被引：5

作者：

Alves, Tiago A. O. ^{[1
]}

Franca, Felipe M. G. ^{[2
]}

Kundu, Sandip ^{[3
]}

机构：

[1] Univ Estado Rio De Janeiro, Rio De Janeiro, Brazil

[2] Univ Fed Rio de Janeiro, Rio De Janeiro, Brazil

[3] Univ Massachusetts, Amherst, MA 01003 USA

来源：

GLSVLSI '19 - PROCEEDINGS OF THE 2019 ON GREAT LAKES SYMPOSIUM ON VLSI | 2019年

关键词：

Neural Networks; Convolutional Neural Network; Adversarial Machine Learning; Model Inversion Attack;

D O I：

10.1145/3299874.3319457

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

As services based on Machine Learning (ML) applications find increasing use, there is a growing risk of attack against such systems. Recently, adversarial machine learning has received a lot of attention, where an adversary is able to craft an input or manipulate an input to cause an ML system to misclassify. Another attack of concern is when an adversary with access to a ML model can reverse engineer attributes of a target class, creating a privacy concern, which is the subject of this paper. Such attacks use non-sensitive data obtainable by the adversary and the confidence levels returned by the ML model to infer sensitive attributes of the target user. Model Inversion attacks may be classified as white-box, where the ML model is known to the attacker, or black-box, where the adversary does not know the internals of the model. If the attacker has access to non-sensitive data of a target user, they can infer sensitive data by applying gradient ascent on the confidence returned by the model. Therefore, a black-box attack can be mounted by numerical approximations of the gradient to perform the gradient ascent. In this work, we present MLPrivacyGuard, a countermeasure against black-box model inversion attack is presented. This countermeasure consists of adding controlled noise to the output of the confidence function. It is important to preserve the accuracy of prediction/classification for the real users of the model while preventing attackers to infer sensitive data. This involves a trade-off between misclassification error and the effectiveness of defense. Based on experimental results, we demonstrate that when noise is injected with a long-tailed distribution, the objectives of low misclassification error with a strong defense can be attained as model inversion attacks are neutralized because numerical approximation of gradient ascent is unable to converge.

引用

页码：411 / 415

页数：5

共 9 条

[1] Principal component analysis [J].

Abdi, Herve ;

Williams, Lynne J. .

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459

[2] Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures [J].

Fredrikson, Matt ;

Jha, Somesh ;

Ristenpart, Thomas .

CCS'15: PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2015, :1322-1333

[3]

Fredrikson M, 2014, PROCEEDINGS OF THE 23RD USENIX SECURITY SYMPOSIUM, P17

[4]

Karatzoglou A, 2017, PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'17), P396

[5] Machine learning for medical diagnosis: history, state of the art and perspective [J].

Kononenko, I .

ARTIFICIAL INTELLIGENCE IN MEDICINE, 2001, 23 (01) :89-109

[6]

Krizhevsky A., 2009, LEARNING MULTIPLE LA

[7] Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey [J].

Nguyen, Giang ;

Dlugolinsky, Stefan ;

Bobak, Martin ;

Viet Tran ;

Lopez Garcia, Alvaro ;

Heredia, Ignacio ;

Malik, Peter ;

Hluchy, Ladislav .

ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (01) :77-124

[8]

Thomson L. M. M., 2015, CALCULUS FINITE DIFF

[9]

Tramèr F, 2016, PROCEEDINGS OF THE 25TH USENIX SECURITY SYMPOSIUM, P601

← 1 →