Towards Defending against Adversarial Examples via Attack-Invariant Features

被引：0

作者：

Zhou, Dawei ^{[1
,2
]}

Liu, Tongliang ^{[2
]}

Han, Bo ^{[3
]}

Wang, Nannan ^{[1
]}

Peng, Chunlei ^{[4
]}

Gao, Xinbo ^{[5
]}

机构：

[1] Xidian Univ, Sch Telecommun Engn, State Key Lab Integrated Serv Networks, Xian, Shaanxi, Peoples R China

[2] Univ Sydney, Sch Comp Sci, Trustworthy Machine Learning Lab, Sydney, NSW, Australia

[3] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Peoples R China

[4] Xidian Univ, State Key Lab Integrated Serv Networks, Sch Cyber Engn, Xian, Shaanxi, Peoples R China

[5] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Image Cognit, Chongqing, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

中国国家自然科学基金; 澳大利亚研究理事会;

关键词：

CORTEX;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural networks (DNNs) are vulnerable to adversarial noise. Their adversarial robustness can be improved by exploiting adversarial examples. However, given the continuously evolving attacks, models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples. To solve this problem, in this paper, we propose to remove adversarial noise by learning generalizable invariant features across attacks which maintain semantic classification information. Specifically, we introduce an adversarial feature learning mechanism to disentangle invariant features from adversarial noise. A normalization term has been proposed in the encoded space of the attack-invariant features to address the bias issue between the seen and unseen types of attacks. Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-the-art approaches, especially against unseen types of attacks and adaptive attacks.

引用

页数：11