SoK: How Robust is Image Classification Deep Neural Network Watermarking?

被引:34
作者
Lukas, Nils [1 ]
Jiang, Edward [1 ]
Li, Xinda [1 ]
Kerschbaum, Florian [1 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
来源
43RD IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2022) | 2022年
关键词
Deep Neural Network; Watermarking; Robustness; Removal Attacks; Image Classification;
D O I
10.1109/SP46214.2022.00004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Network (DNN) watermarking is a method for provenance verification of DNN models. Watermarking should be robust against watermark removal attacks that derive a surrogate model that evades provenance verification. Many watermarking schemes that claim robustness have been proposed, but their robustness is only validated in isolation against a relatively small set of attacks. There is no systematic, empirical evaluation of these claims against a common, comprehensive set of removal attacks. This uncertainty about a watermarking scheme's robustness causes difficulty to trust their deployment in practice. In this paper, we evaluate whether recently proposed watermarking schemes that claim robustness are robust against a large set of removal attacks. We survey methods from the literature that (i) are known removal attacks, (ii) derive surrogate models but have not been evaluated as removal attacks, and (iii) novel removal attacks. Weight shifting and smooth retraining are novel removal attacks adapted to the DNN watermarking schemes surveyed in this paper. We propose taxonomies for watermarking schemes and removal attacks. Our empirical evaluation includes an ablation study over sets of parameters for each attack and watermarking scheme on the image classification datasets CIFAR-10 and ImageNet. Surprisingly, our study shows that none of the surveyed watermarking schemes is robust in practice. We find that schemes fail to withstand adaptive attacks and known methods for deriving surrogate models that have not been evaluated as removal attacks. This points to intrinsic flaws in how robustness is currently evaluated. Our evaluation includes a discussion of the runtime of each attack to underpin their practical relevance. While none of the schemes is robust against all attacks, none of the attacks removes all watermarks. We show that attacks can be combined and find combined attacks that remove all watermarks. We show that watermarking schemes need to be evaluated against a more extensive set of removal attacks with a more realistic adversary model. Our source code and a complete dataset of evaluation results are publicly available, which allows to independently verify our conclusions.
引用
收藏
页码:787 / 804
页数:18
相关论文
共 66 条
  • [1] Adi Y, 2018, PROCEEDINGS OF THE 27TH USENIX SECURITY SYMPOSIUM, P1615
  • [2] Aiken W, 2020, Arxiv, DOI arXiv:2004.11368
  • [3] Atli B. G., 2020, ENG DEPENDABLE SECUR, P42
  • [4] Carlini Nicholas, 2020, Advances in Cryptology - CRYPTO 2020. 40th Annual International Cryptology Conference, CRYPTO 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12172), P189, DOI 10.1007/978-3-030-56877-1_7
  • [5] No free lunch for early stopping
    Cataltepe, Z
    Abu-Mostafa, YS
    Magdon-Ismail, M
    [J]. NEURAL COMPUTATION, 1999, 11 (04) : 995 - 1009
  • [6] Chen HL, 2019, Arxiv, DOI arXiv:1904.00344
  • [7] Chen HL, 2018, Arxiv, DOI arXiv:1804.03648
  • [8] Coleman C. A., 2017, Training, V100, P102
  • [9] Rouhani BD, 2018, Arxiv, DOI arXiv:1804.00750
  • [10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848