Stronger data poisoning attacks break data sanitization defenses

被引：115

作者：

Koh, Pang Wei ^{[1
]}

Steinhardt, Jacob ^{[2
]}

Liang, Percy ^{[1
]}

机构：

[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA USA

来源：

MACHINE LEARNING | 2022年 / 111卷 / 01期

关键词：

Data poisoning; Data sanitization; Anomaly detection; Security; LEARNING HALFSPACES; SECURITY;

D O I：

10.1007/s10994-021-06119-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models' training sets. A common defense against these attacks is data sanitization: first filter out anomalous training points before training the model. In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition. By adding just 3% poisoned data, our attacks successfully increase test error on the Enron spam detection dataset from 3 to 24% and on the IMDB sentiment classification dataset from 12 to 29%. In contrast, existing attacks which do not explicitly account for these data sanitization defenses are defeated by them. Our attacks are based on two ideas: (i) we coordinate our attacks to place poisoned points near one another, and (ii) we formulate each attack as a constrained optimization problem, with constraints designed to ensure that the poisoned points evade detection. As this optimization involves solving an expensive bilevel problem, our three attacks correspond to different ways of approximating this problem, based on influence functions; minimax duality; and the Karush-Kuhn-Tucker (KKT) conditions. Our results underscore the need to develop more robust defenses against data poisoning attacks.

引用

页码：1 / 47

页数：47

共 77 条

[1]

Agarwal Naman, 2016, stat, V1050, P15

[2]

[Anonymous], 2015, ARXIV151208602

[3]

[Anonymous], 2014, AISEC CCS ACM

[4]

[Anonymous], 2012, ICML, DOI 10.48550/arxiv.1206.6389

[5]

[Anonymous], 2016, Towards the science of security and privacy in machine learning

[6]

Athalye A, 2018, PR MACH LEARN RES, V80

[7] The Power of Localization for Efficiently Learning Linear Separators with Noise [J].

Awasthi, Pranjal ;

Balcan, Maria Florina ;

Long, Philip M. .

STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, :449-458

[8] Notes About the Caratheodory Number [J].

Barany, Imre ;

Karasev, Roman .

DISCRETE & COMPUTATIONAL GEOMETRY, 2012, 48 (03) :783-792

[9]

Bard J. F., 1999, PRACTICAL BILEVEL OP

[10] SOME PROPERTIES OF THE BILEVEL PROGRAMMING PROBLEM [J].

BARD, JF .

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1991, 68 (02) :371-378

← 1 2 3 4 5 6 7 8 →