Black-box Adversarial Attacks on Video Recognition Models

被引：96

作者：

Jiang, Linxi ^{[1
]}

Ma, Xingjun ^{[2
]}

Chen, Shaoxiang ^{[1
]}

Bailey, James ^{[2
]}

Jiang, Yu-Gang ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China

[2] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic, Australia

来源：

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年

基金：

中国国家自然科学基金;

关键词：

Adversarial examples; video recognition; black-box attack; model security;

D O I：

10.1145/3343031.3351088

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Deep neural networks (DNNs) are known for their vulnerability to adversarial examples. These are examples that have undergone small, carefully crafted perturbations, and which can easily fool a DNN into making misclassifications at test time. Thus far, the field of adversarial research has mainly focused on image models, under either a white-box setting, where an adversary has full access to model parameters, or a black-box setting where an adversary can only query the target model for probabilities or labels. Whilst several white-box attacks have been proposed for video models, black-box video attacks are still unexplored. To close this gap, we propose the first black-box video attack framework, called V-BAD. V-BAD utilizes tentative perturbations transferred from image models and partition-based rectifications found by the NES to obtain good adversarial gradient estimates with fewer queries to the target model. V-BAD is equivalent to estimating the projection of the adversarial gradient on a selected subspace. Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models. For the targeted attack, it achieves >93% success rate using only an average of 3.4 similar to 8.4 x 10(4) queries, a similar number of queries to state-of-the-art black-box image attacks. This is despite the fact that videos often have two orders of magnitude higher dimensionality than static images. We believe that V-BAD is a promising new tool to evaluate and improve the robustness of video recognition models to black-box adversarial attacks.

引用

页码：864 / 872

页数：9

共 45 条

[1]

[Anonymous], ICML

[2]

[Anonymous], 2017, S P

[3]

[Anonymous], 2016, EUROS P

[4]

[Anonymous], 2015, CVPR

[5]

[Anonymous], 2013, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2012.59

[6]

[Anonymous], 1990, NIPS

[7]

[Anonymous], ARXIV180900958

[8]

[Anonymous], 2009, PROC IEEE C COMPUT V

[9]

[Anonymous], 2018, AAAI

[10]

[Anonymous], 2017, ASIACCS

← 1 2 3 4 5 →