Weight Poisoning Attacks on Pre-trained Models

被引:0
|
作者
Kurita, Keita [1 ]
Michel, Paul [1 ]
Neubig, Graham [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
来源
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020) | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, NLP has seen a surge in the usage of large pre-trained models. Users download weights of models pre-trained on large datasets, then tine-tune the weights on a task of their choice. This raises the question of whether downloading untrusted pre-trained weights can pose a security threat. In this paper, we show that it is possible to construct "weight poisoning" attacks where pre-trained weights are injected with vulnerabilities that expose "backdoors" afterfine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword. We show that by applying a regularization method, which we call RIPPLe, and an initialization procedure, which we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure. Our experiments on sentiment classification, toxicity detection, arid wain detection show that this attack is widely applicable and poses a serious threat. Finally, we outline practical defenses against such attacks. Code to reproduce our experiments is available at https://github.corn/neulab/RIPPLe.
引用
收藏
页码:2793 / 2806
页数:14
相关论文
共 50 条
  • [1] Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
    Li, Linyang
    Song, Demin
    Li, Xiaonan
    Zeng, Jiehang
    Ma, Ruotian
    Qiu, Xipeng
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3023 - 3032
  • [2] Aliasing Backdoor Attacks on Pre-trained Models
    Wei, Cheng'an
    Lee, Yeonjoon
    Chen, Kai
    Meng, Guozhu
    Lv, Peizhuo
    PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2707 - 2724
  • [3] Indiscriminate Data Poisoning Attacks on Pre-trained Feature Extractors
    Lu, Yiwei
    Yang, Matthew Y. R.
    Kamath, Gautam
    Yu, Yaoliang
    IEEE CONFERENCE ON SAFE AND TRUSTWORTHY MACHINE LEARNING, SATML 2024, 2024, : 327 - 343
  • [4] Manipulating Pre-Trained Encoder for Targeted Poisoning Attacks in Contrastive Learning
    Chen, Jian
    Gao, Yuan
    Liu, Gaoyang
    Abdelmoniem, Ahmed M.
    Wang, Chen
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 2412 - 2424
  • [5] UOR: Universal Backdoor Attacks on Pre-trained Language Models
    Du, Wei
    Li, Peixuan
    Zhao, Haodong
    Ju, Tianjie
    Ren, Ge
    Liu, Gongshen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 7865 - 7877
  • [6] Multi-target Backdoor Attacks for Code Pre-trained Models
    Li, Yanzhou
    Liu, Shangqing
    Chen, Kangjie
    Xie, Xiaofei
    Zhang, Tianwei
    Liu, Yang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7236 - 7254
  • [7] Pre-trained Trojan Attacks for Visual Recognition
    Liu, Aishan
    Liu, Xianglong
    Zhang, Xinwei
    Xiao, Yisong
    Zhou, Yuguang
    Liang, Siyuan
    Wang, Jiakai
    Cao, Xiaochun
    Tao, Dacheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [8] PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning
    Du, Wei
    Zhao, Yichun
    Li, Boqun
    Liu, Gongshen
    Wang, Shilin
    PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 680 - 686
  • [9] Adversarial Attacks on Pre-trained Deep Learning Models for Encrypted Traffic Analysis
    Seok, Byoungjin
    Sohn, Kiwook
    JOURNAL OF WEB ENGINEERING, 2024, 23 (06): : 749 - 768
  • [10] Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models
    Wang, Shuo
    Nepal, Surya
    Rudolph, Carsten
    Grobler, Marthie
    Chen, Shangyu
    Chen, Tianle
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1526 - 1539