Unlearning Backdoor Attacks through Gradient-Based Model Pruning

被引:0
作者
Dunnett, Kealan [1 ,2 ]
Arablouei, Reza [2 ]
Miller, Dimity [1 ]
Dedeoglu, Volkan [1 ,2 ]
Jurdak, Raja [1 ]
机构
[1] Queensland Univ Technol, Brisbane, Qld, Australia
[2] CSIROs Data61, Canberra, ACT, Australia
来源
2024 54TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS, DSN-W 2024 | 2024年
关键词
backdoor attack; backdoor mitigation; model pruning; unlearning;
D O I
10.1109/DSN-W60302.2024.00021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the era of increasing concerns over cybersecurity threats, defending against backdoor attacks is paramount in ensuring the integrity and reliability of machine learning models. However, many existing approaches require substantial amounts of data for effective mitigation, posing significant challenges in practical deployment. To address this, we propose a novel approach to counter backdoor attacks by treating their mitigation as an unlearning task. We tackle this challenge through a targeted model pruning strategy, leveraging unlearning loss gradients to identify and eliminate backdoor elements within the model. Built on solid theoretical insights, our approach offers simplicity and effectiveness, rendering it well-suited for scenarios with limited data availability. Our methodology includes formulating a suitable unlearning loss and devising a model-pruning technique tailored for convolutional neural networks. Comprehensive evaluations demonstrate the efficacy of our proposed approach compared to state-of-the-art approaches, particularly in realistic data settings.
引用
收藏
页码:46 / 54
页数:9
相关论文
共 25 条
  • [1] Barni M, 2019, IEEE IMAGE PROC, P101, DOI [10.1109/ICIP.2019.8802997, 10.1109/icip.2019.8802997]
  • [2] Chen Weixin, 2022, Advances in Neural Information Processing Systems
  • [3] Chen XY, 2017, Arxiv, DOI [arXiv:1712.05526, DOI 10.48550/ARXIV.1712.05526]
  • [4] BadNets: Evaluating Backdooring Attacks on Deep Neural Networks
    Gu, Tianyu
    Liu, Kang
    Dolan-Gavitt, Brendan
    Garg, Siddharth
    [J]. IEEE ACCESS, 2019, 7 : 47230 - 47244
  • [5] Houben S, 2013, IEEE IJCNN
  • [6] LIRA: Learnable, Imperceptible and Robust Backdoor Attacks
    Khoa Doan
    Lao, Yingjie
    Zhao, Weijie
    Li, Ping
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11946 - 11956
  • [7] Krizhevsky A, 2009, CIFAR 10
  • [8] Li Y., 2020, INT C LEARN REPR
  • [9] Li Yichen, P MACHINE LEARNING R
  • [10] Li YG, 2021, Arxiv, DOI arXiv:2101.05930