Black-box Adversarial Attacks on Video Recognition Models

被引:82
作者
Jiang, Linxi [1 ]
Ma, Xingjun [2 ]
Chen, Shaoxiang [1 ]
Bailey, James [2 ]
Jiang, Yu-Gang [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[2] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic, Australia
来源
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年
基金
中国国家自然科学基金;
关键词
Adversarial examples; video recognition; black-box attack; model security;
D O I
10.1145/3343031.3351088
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Deep neural networks (DNNs) are known for their vulnerability to adversarial examples. These are examples that have undergone small, carefully crafted perturbations, and which can easily fool a DNN into making misclassifications at test time. Thus far, the field of adversarial research has mainly focused on image models, under either a white-box setting, where an adversary has full access to model parameters, or a black-box setting where an adversary can only query the target model for probabilities or labels. Whilst several white-box attacks have been proposed for video models, black-box video attacks are still unexplored. To close this gap, we propose the first black-box video attack framework, called V-BAD. V-BAD utilizes tentative perturbations transferred from image models and partition-based rectifications found by the NES to obtain good adversarial gradient estimates with fewer queries to the target model. V-BAD is equivalent to estimating the projection of the adversarial gradient on a selected subspace. Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models. For the targeted attack, it achieves >93% success rate using only an average of 3.4 similar to 8.4 x 10(4) queries, a similar number of queries to state-of-the-art black-box image attacks. This is despite the fact that videos often have two orders of magnitude higher dimensionality than static images. We believe that V-BAD is a promising new tool to evaluate and improve the robustness of video recognition models to black-box adversarial attacks.
引用
收藏
页码:864 / 872
页数:9
相关论文
共 45 条
[1]  
[Anonymous], ICML
[2]  
[Anonymous], 2017, S P
[3]  
[Anonymous], 2016, EUROS P
[4]  
[Anonymous], 2015, CVPR
[5]  
[Anonymous], 2013, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2012.59
[6]  
[Anonymous], 1990, NIPS
[7]  
[Anonymous], ARXIV180900958
[8]  
[Anonymous], 2009, PROC IEEE C COMPUT V
[9]  
[Anonymous], 2018, AAAI
[10]  
[Anonymous], 2017, ASIACCS