Generating Semantic Adversarial Examples via Feature Manipulation in Latent Space

被引：4

作者：

Wang, Shuo ^{[1
]}

Chen, Shangyu ^{[2
]}

Chen, Tianle ^{[3
]}

Nepal, Surya ^{[1
]}

Rudolph, Carsten ^{[2
]}

Grobler, Marthie ^{[1
]}

机构：

[1] CSIRO, Data61 & Cybersecur CRC, Marsfield, NSW 2122, Australia

[2] Monash Univ, Fac Informat Technol, Melbourne, Vic 3800, Australia

[3] Univ Queensland, St Lucia, Qld 4072, Australia

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 12期

关键词：

Adversarial examples; feature manipulation; latent representation; neural networks; variational autoencoder (VAE);

D O I：

10.1109/TNNLS.2023.3299408

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The susceptibility of deep neural networks (DNNs) to adversarial intrusions, exemplified by adversarial examples, is well-documented. Conventional attacks implement unstructured, pixel-wise perturbations to mislead classifiers, which often results in a noticeable departure from natural samples and lacks human-perceptible interpretability. In this work, we present an adversarial attack strategy that implements fine-granularity, semantic-meaning-oriented structural perturbations. Our proposed methodology manipulates the semantic attributes of images through the use of disentangled latent codes. We engineer adversarial perturbations by manipulating either a single latent code or a combination thereof. To this end, we propose two unsupervised semantic manipulation strategies: one based on vector-disentangled representation and the other on feature map-disentangled representation, taking into consideration the complexity of the latent codes and the smoothness of the reconstructed images. Our empirical evaluations, conducted extensively on real-world image data, showcase the potency of our attacks, particularly against black-box classifiers. Furthermore, we establish the existence of a universal semantic adversarial example that is agnostic to specific images.

引用

页码：17070 / 17084

页数：15

共 39 条

[11]

Metzen JH, 2017, Arxiv, DOI arXiv:1702.04267

[12] Semantic Adversarial Examples [J].

Hosseini, Hossein ;

Poovendran, Radha .

PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, :1695-1700

[13] Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations [J].

Hsiung, Lei ;

Tsai, Yun-Yun ;

Chen, Pin-Yu ;

Ho, Tsung-Yi .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :24658-24667

[14] Multimodal Unsupervised Image-to-Image Translation [J].

Huang, Xun ;

Liu, Ming-Yu ;

Belongie, Serge ;

Kautz, Jan .

COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 :179-196

[15]

Goodfellow IJ, 2015, Arxiv, DOI arXiv:1412.6572

[16]

Kim H, 2019, Arxiv, DOI arXiv:1802.05983

[17]

Kumar A., 2017, P ADV NEUR INF PROC

[18]

Larsen ABL, 2016, Arxiv, DOI [arXiv:1512.09300, DOI 10.48550/ARXIV.1512.09300]

[19] Gradient-based learning applied to document recognition [J].

Lecun, Y ;

Bottou, L ;

Bengio, Y ;

Haffner, P .

PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324

[20] Object recognition with gradient-based learning [J].

LeCun, Y ;

Haffner, P ;

Bottou, L ;

Bengio, Y .

SHAPE, CONTOUR AND GROUPING IN COMPUTER VISION, 1999, 1681 :319-345

← 1 2 3 4 →