A Novel Semi-Supervised Adversarially Learned Meta-Classifier for Detecting Neural Trojan Attacks

被引：0

作者：

Ghahremani, Shahram ^{[1
]}

Bidgoly, Amir Jalaly ^{[2
]}

Nguyen, Uyen Trang ^{[1
]}

Yau, David K. Y. ^{[3
]}

机构：

[1] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON M3J 1P3, Canada

[2] Univ Qom, Dept Informat Technol & Comp Engn, Qom 3716146611, Iran

[3] Singapore Univ Technol & Design, Informat Syst Technol & Design Pillar, Singapore 487372, Singapore

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Deep neural networks; neural Trojan attacks; generative adversarial network; one-class learning; semi-supervised learning;

D O I：

10.1109/ACCESS.2023.3339542

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural networks (DNNs) are highly vulnerable to neural Trojan attacks. To carry out such an attack, an adversary retrains a DNN with poisoned data or modifies its parameters to produce incorrect output. These attacks can remain unnoticed until triggered by a specific pattern in the input, making detection challenging. In this article, we propose a novel semi-supervised adversarially learned meta-classifier (SESALME) to detect if a target model has been trojaned. Unlike previous Trojan detection methods, SESALME assumes that the defender has no knowledge of the attack mechanisms, and no access to training data, poisoned data, or parameters/layers of a target model. In the absence of poisoned data and knowledge of the attack mechanisms, we use a set of shadow models to emulate normal behavior of the target model. Having learned the normal behavior of the target model, SESALME then uses one-class learning, implemented within a semi-supervised generative adversarial network (GAN), to detect abnormal behavior of a model to be investigated, if any. Behavior that deviates from the learned normal behavior indicates a high likelihood that the model is trojaned. We compare the performance of SESALME with that of state-of-the-art neural Trojan detectors using popular datasets such as MNIST, CIFAR-10, and SC. Experimental results show that SESALME outperforms state-of-the-art Trojan detection methods in terms of detection performance and inference time in almost all cases, while being attack-agnostic and requiring no access to training data, poisoned data, or parameters of the target model.

引用

页码：138303 / 138315

页数：13

共 39 条

[1] Ajagbe S.A., 2021, INT J ADV COMPUT RES, V11, P51, DOI [DOI 10.19101/IJACR.2021.1152001, 10.19101/IJACR.2021.1152001, DOI 10.19101/IJACR, 10.19101/ijacr.2021.1152001]
[2] Deep learning techniques for detection and prediction of pandemic diseases: a systematic literature review
Ajagbe, Sunday Adeola
Adigun, Matthew O.
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5893 - 5927
[3] amazon, AWS Marketplace: Homepage
[4] [Anonymous], 2018, Release BigML.com
[5] [Anonymous], 2023, Models-Hugging Face
[6] Towards Evaluating the Robustness of Neural Networks
Carlini, Nicholas
Wagner, David
[J]. 2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, : 39 - 57
[7] Chen BY, 2018, Arxiv, DOI arXiv:1811.03728
[8] Chen HL, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4658
[9] Chen XY, 2017, Arxiv, DOI arXiv:1712.05526
[10] SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems
Chou, Edward
Tramer, Florian
Pellegrino, Giancarlo
[J]. 2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2020), 2020, : 48 - 54

← 1 2 3 4 →