Towards Lightweight Black-Box Attacks Against Deep Neural Networks

被引：0

作者：

Sun, Chenghao ^{[1
]}

Zhang, Yonggang ^{[2
]}

Wan Chaoqun ^{[3
]}

Wang, Qizhou ^{[2
]}

Li, Ya ^{[4
]}

Liu, Tongliang ^{[5
]}

Han, Bo ^{[2
]}

Tian, Xinmei ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

[2] Hong Kong Baptist Univ, Hong Kong, Peoples R China

[3] Alibaba Cloud Comp Ltd, Beijing, Peoples R China

[4] IFlytek Res, Hefei, Anhui, Peoples R China

[5] Univ Sydney, Sydney, NSW 2006, Australia

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

澳大利亚研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Black-box attacks can generate adversarial examples without accessing the parameters of deep neural networks (DNNs), largely exacerbating the threats of deployed models. However, previous works state that black-box attacks fail to mislead DNNs when their training data and outputs are inaccessible. In this work, we argue that black-box attacks can pose practical attacks in this highly restrictive scenario where only several test samples are available. Specifically, we find that attacking the shallow layers of DNNs trained on a few test samples can generate powerful adversarial examples. As only a few samples are required, we refer to these attacks as lightweight black-box attacks. The main challenge to promoting lightweight attacks is to mitigate the adverse impact caused by the approximation error of shallow layers. As it is hard to mitigate the approximation error with few available samples, we propose Error TransFormer (ETF) for lightweight attacks. Namely, ETF transforms the approximation error in the parameter space into a perturbation in the feature space and alleviates the error by disturbing features. In our experiments, lightweight black-box attacks with the proposed ETF achieve surprising results. For example, even if only 1 sample per category is available, the attack success rate achieved by lightweight black-box attacks is only about 3% lower than that of the black-box attacks using complete training data.

引用

页数：13

共 64 条

[1]

[Anonymous], 2018, NeurIPS

[2]

[Anonymous], 2016, INT C LEARN REPR

[3]

Asano Y. M., 2020, ICLR, P1

[4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[5]

Chen PY, 2018, AAAI CONF ARTIF INTE, P10

[6]

Chen PY, 2017, P 10 ACM WORKSH ART, P15, DOI [10.1145/3128572.3140448, DOI 10.1145/3128572.3140448]

[7]

Chen T., 2020, ARXIV

[8]

Chen Y., 2020, Advances in neural information processing systems, P19314

[9]

Chen Yuzhou, 2022, ICLR

[10]

Croce F., 2021, NeurIPS

← 1 2 3 4 5 6 7 →