Deep Neural Network Watermarking against Model Extraction Attack

被引:10
作者
Tan, Jingxuan [1 ,2 ]
Zhong, Nan [1 ,2 ]
Qian, Zhenxing [1 ,2 ]
Zhang, Xinpeng [1 ,2 ]
Li, Sheng [1 ,2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Fudan Univ, Key Lab Culture & Tourism Intelligent Comp, Shanghai, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
关键词
deep neural network; model watermarking; intellectual property protection; model extraction attack;
D O I
10.1145/3581783.3612515
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural network (DNN) watermarking is an emerging technique to protect the intellectual property of deep learning models. At present, many DNN watermarking algorithms have been proposed to achieve provenance verification by embedding identify information into the internals or prediction behaviors of the host model. However, most methods are vulnerable to model extraction attacks, where attackers collect output labels from the model to train a surrogate or a replica. To address this issue, we present a novel DNN watermarking approach, named SSW, which constructs an adaptive trigger set progressively by optimizing over a pair of symmetric shadow models to enhance the robustness to model extraction. Precisely, we train a positive shadow model supervised by the prediction of the host model to mimic the behaviors of potential surrogate models. Additionally, a negative shadow model is normally trained to imitate irrelevant independent models. Using this pair of shadow models as a reference, we design a strategy to update the trigger samples appropriately such that they tend to persist in the host model and its stolen copies. Moreover, our method could well support two specific embedding schemes: embedding the watermark via fine-tuning or from scratch. Our extensive experimental results on popular datasets demonstrate that our SSW approach outperforms state-of-the-art methods against various model extraction attacks in whether trigger set classification accuracy based or hypothesis test based verification. The results also show that our method is robust to common model modification schemes including fine-tuning and model compression.
引用
收藏
页码:1588 / 1597
页数:10
相关论文
共 42 条
[1]   Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding [J].
Abdelnabi, Sahar ;
Fritz, Mario .
2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2021, :121-140
[2]  
Adi Y, 2018, PROCEEDINGS OF THE 27TH USENIX SECURITY SYMPOSIUM, P1615
[3]  
Cohen G, 2017, IEEE IJCNN, P2921, DOI 10.1109/IJCNN.2017.7966217
[4]  
Fan LX, 2019, ADV NEUR IN, V32
[5]  
Frosst N., 2019, P INT C MACH LEARN, P2012
[6]   BadNets: Evaluating Backdooring Attacks on Deep Neural Networks [J].
Gu, Tianyu ;
Liu, Kang ;
Dolan-Gavitt, Brendan ;
Garg, Siddharth .
IEEE ACCESS, 2019, 7 :47230-47244
[7]  
He K, 2016, Proceedings of the IEEE conference on computer vision and pattern recognition, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
[8]  
He Xuanli, 2022, P AAAI C ART INT
[9]  
Hinton G., 2015, ARXIV
[10]   Unambiguous and High-Fidelity Backdoor Watermarking for Deep Neural Networks [J].
Hua, Guang ;
Teoh, Andrew Beng Jin ;
Xiang, Yong ;
Jiang, Hao .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) :11204-11217