Engravings, Secrets, and Interpretability of Neural Networks

被引:0
作者
Hobbs, Nathaniel [1 ]
Papakonstantinou, Periklis A. [1 ]
Vaidya, Jaideep [1 ]
机构
[1] Rutgers State Univ, New Brunswick, NJ 08901 USA
基金
美国国家卫生研究院;
关键词
Artificial neural networks; Training; Threat modeling; Predictive models; Task analysis; Classification algorithms; Standards; Backdoor attack; data poisoning; engraving; interpretability; machine learning; neural net; security;
D O I
10.1109/TETC.2024.3358759
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work proposes a definition and examines the problem of undetectably engraving special input/output information into a Neural Network (NN). Investigation of this problem is significant given the ubiquity of neural networks and society's reliance on their proper training and use. We systematically study this question and provide (1) definitions of security for secret engravings, (2) machine learning methods for the construction of an engraved network, (3) a threat model that is instantiated with state-of-the-art interpretability methods to devise distinguishers/attackers. In this work, there are two kinds of algorithms. First, the constructions of engravings through machine learning training methods. Second, the distinguishers associated with the threat model. The weakest of our engraved NN constructions are insecure and can be broken by our distinguishers, whereas other, more systematic engravings are resilient to each of our distinguishing attacks on three prototypical image classification datasets. Our threat model is of independent interest, as it provides a concrete quantification/benchmark for the "goodness" of interpretability methods.
引用
收藏
页码:1093 / 1104
页数:12
相关论文
共 39 条
[1]  
Bakator Mihalj, 2018, Multimodal Technologies and Interaction, V2, DOI 10.3390/mti2030047
[2]  
Barni M, 2019, IEEE IMAGE PROC, P101, DOI [10.1109/icip.2019.8802997, 10.1109/ICIP.2019.8802997]
[3]  
Bojarski M, 2016, Arxiv, DOI [arXiv:1604.07316, DOI 10.48550/ARXIV.1604.07316]
[4]  
Carlini N., 2022, P 10 INT C LEARN REP
[5]  
Chen BY, 2018, Arxiv, DOI arXiv:1811.03728
[6]  
Chen HL, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4658
[7]  
Chen XY, 2017, Arxiv, DOI arXiv:1712.05526
[8]   On the Efficacy of Knowledge Distillation [J].
Cho, Jang Hyun ;
Hariharan, Bharath .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4793-4801
[9]   SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems [J].
Chou, Edward ;
Tramer, Florian ;
Pellegrino, Giancarlo .
2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2020), 2020, :48-54
[10]  
Dhamdhere K., 2019, ICLR