PRADA: Practical Black-box Adversarial Attacks against Neural Ranking Models

被引:7
作者
Wu, Chen [1 ,2 ]
Zhang, Ruqing [1 ,2 ]
Guo, Jiafeng [1 ,2 ]
De Rijke, Maarten [3 ]
Fan, Yixing [2 ,4 ]
Cheng, Xueqi [2 ,4 ]
机构
[1] Inst Comp Technol Acad Sci, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, 6 Kexueyuan South Rd, Beijing 100190, Peoples R China
[3] Univ Amsterdam, NL-1012WX Amsterdam, Netherlands
[4] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Adversarial attack; decision-based black-box attack setting; neural ranking models; SPAM DETECTION;
D O I
10.1145/3576923
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Neural ranking models (NRMs) have shown remarkable success in recent years, especially with pre-trained language models. However, deep neural models are notorious for their vulnerability to adversarial examples. Adversarial attacks may become a new type of web spamming technique given our increased reliance on neural information retrieval models. Therefore, it is important to study potential adversarial attacks to identify vulnerabilities of NRMs before they are deployed. In this article, we introduce the Word Substitution Ranking Attack (WSRA) task against NRMs, which aims at promoting a target document in rankings by adding adversarial perturbations to its text. We focus on the decision-based black-box attack setting, where the attackers cannot directly get access to the model information, but can only query the target model to obtain the rank positions of the partial retrieved list. This attack setting is realistic in real-world search engines. We propose a novel Pseudo Relevance-based ADversarial ranking Attack method (PRADA) that learns a surrogate model based on Pseudo Relevance Feedback (PRF) to generate gradients for finding the adversarial perturbations. Experiments on two web search benchmark datasets show that PRADA can outperform existing attack strategies and successfully fool the NRM with small indiscernible perturbations of text.
引用
收藏
页数:27
相关论文
共 94 条
[1]   Detecting Deceptive Reviews using Generative Adversarial Networks [J].
Aghakhani, Hojjat ;
Machiry, Aravind ;
Nilizadeh, Shirin ;
Kruegel, Christopher ;
Vigna, Giovanni .
2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, :89-95
[2]  
Alzantot M, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2890
[3]  
Athalye A, 2018, Arxiv, DOI arXiv:1804.03286
[4]  
Benczur Andras A., 2005, P AIRWEB
[5]  
Burges C.J., 2010, LEARNING, V11, P81
[6]   Adversarial Attacks and Detection on Reinforcement Learning-Based Interactive Recommender Systems [J].
Cao, Yuanjiang ;
Chen, Xiaocong ;
Yao, Lina ;
Wang, Xianzhi ;
Zhang, Wei Emma .
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, :1669-1672
[7]   Adversarial Web search [J].
Castillo C. ;
Davison B.D. .
Foundations and Trends in Information Retrieval, 2010, 4 (05) :377-486
[8]  
Cer D, 2018, Arxiv, DOI arXiv:1803.11175
[9]   HopSkipJumpAttack: A Query-Efficient Decision-Based Attack [J].
Chen, Jianbo ;
Jordan, Michael, I ;
Wainwright, Martin J. .
2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2020), 2020, :1277-1294
[10]   DAIR: A Query-Efficient Decision-based Attack on Image Retrieval Systems [J].
Chen, Mingyang ;
Lu, Junda ;
Wang, Yi ;
Qin, Jianbin ;
Wang, Wei .
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, :1064-1073