Weight Poisoning Attacks on Pre-trained Models

被引：0

作者：

Kurita, Keita ^{[1
]}

Michel, Paul ^{[1
]}

Neubig, Graham ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA

来源：

58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, NLP has seen a surge in the usage of large pre-trained models. Users download weights of models pre-trained on large datasets, then tine-tune the weights on a task of their choice. This raises the question of whether downloading untrusted pre-trained weights can pose a security threat. In this paper, we show that it is possible to construct "weight poisoning" attacks where pre-trained weights are injected with vulnerabilities that expose "backdoors" afterfine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword. We show that by applying a regularization method, which we call RIPPLe, and an initialization procedure, which we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure. Our experiments on sentiment classification, toxicity detection, arid wain detection show that this attack is widely applicable and poses a serious threat. Finally, we outline practical defenses against such attacks. Code to reproduce our experiments is available at https://github.corn/neulab/RIPPLe.

引用

页码：2793 / 2806

页数：14

共 50 条

[1] Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
Li, Linyang
Song, Demin
Li, Xiaonan
Zeng, Jiehang
Ma, Ruotian
Qiu, Xipeng
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3023 - 3032
[2] Aliasing Backdoor Attacks on Pre-trained Models
Wei, Cheng'an
Lee, Yeonjoon
Chen, Kai
Meng, Guozhu
Lv, Peizhuo
PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2707 - 2724
[3] Indiscriminate Data Poisoning Attacks on Pre-trained Feature Extractors
Lu, Yiwei
Yang, Matthew Y. R.
Kamath, Gautam
Yu, Yaoliang
IEEE CONFERENCE ON SAFE AND TRUSTWORTHY MACHINE LEARNING, SATML 2024, 2024, : 327 - 343
[4] Manipulating Pre-Trained Encoder for Targeted Poisoning Attacks in Contrastive Learning
Chen, Jian
Gao, Yuan
Liu, Gaoyang
Abdelmoniem, Ahmed M.
Wang, Chen
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 2412 - 2424
[5] UOR: Universal Backdoor Attacks on Pre-trained Language Models
Du, Wei
Li, Peixuan
Zhao, Haodong
Ju, Tianjie
Ren, Ge
Liu, Gongshen
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 7865 - 7877
[6] Multi-target Backdoor Attacks for Code Pre-trained Models
Li, Yanzhou
Liu, Shangqing
Chen, Kangjie
Xie, Xiaofei
Zhang, Tianwei
Liu, Yang
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7236 - 7254
[7] Pre-trained Trojan Attacks for Visual Recognition
Liu, Aishan
Liu, Xianglong
Zhang, Xinwei
Xiao, Yisong
Zhou, Yuguang
Liang, Siyuan
Wang, Jiakai
Cao, Xiaochun
Tao, Dacheng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
[8] PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning
Du, Wei
Zhao, Yichun
Li, Boqun
Liu, Gongshen
Wang, Shilin
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 680 - 686
[9] Adversarial Attacks on Pre-trained Deep Learning Models for Encrypted Traffic Analysis
Seok, Byoungjin
Sohn, Kiwook
JOURNAL OF WEB ENGINEERING, 2024, 23 (06): : 749 - 768
[10] Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models
Wang, Shuo
Nepal, Surya
Rudolph, Carsten
Grobler, Marthie
Chen, Shangyu
Chen, Tianle
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1526 - 1539

← 1 2 3 4 5 →