A Robust Approach for Securing Audio Classification Against Adversarial Attacks

被引：42

作者：

Esmaeilpour, Mohammad ^{[1
]}

Cardinal, Patrick ^{[1
]}

Koerich, Alessandro ^{[1
]}

机构：

[1] Univ Quebec, Ecole Technol Super, Dept Software & IT Engn, Montreal, PQ H3C 1K3, Canada

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2020年 / 15卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Support vector machines; Machine learning; Robustness; Perturbation methods; Predictive models; Optimization; Two dimensional displays; Spectrograms; environmental sound classification; adversarial attack; K-means plus plus; support vector machines (SVM); convolutional denoising autoencoder;

D O I：

10.1109/TIFS.2019.2956591

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Adversarial audio attacks can be considered as a small perturbation unperceptive to human ears that is intentionally added to an audio signal and causes a machine learning model to make mistakes. This poses a security concern about the safety of machine learning models since the adversarial attacks can fool such models toward the wrong predictions. In this paper we first review some strong adversarial attacks that may affect both audio signals and their 2D representations and evaluate the resiliency of deep learning models and support vector machines (SVM) trained on 2D audio representations such as short time Fourier transform, discrete wavelet transform (DWT) and cross recurrent plot against several state-of-the-art adversarial attacks. Next, we propose a novel approach based on pre-processed DWT representation of audio signals and SVM to secure audio systems against adversarial attacks. The proposed architecture has several preprocessing modules for generating and enhancing spectrograms including dimension reduction and smoothing. We extract features from small patches of the spectrograms using the speeded up robust feature (SURF) algorithm which are further used to transform into cluster distance distribution using the K-Means++ algorithm. Finally, SURF-generated vectors are encoded by this codebook and the resulting codewords are used for training a SVM. All these steps yield to a novel approach for audio classification that provides a good tradeoff between accuracy and resilience. Experimental results on three environmental sound datasets show the competitive performance of the proposed approach compared to the deep neural networks both in terms of accuracy and robustness against strong adversarial attacks.

引用

页码：2147 / 2159

页数：13

共 46 条

[1]

Alzantot M., 2018, arXiv

[2]

[Anonymous], 2019, ARXIV190108928

[3]

Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027

[4]

Aytar Y, 2016, ADV NEUR IN, V29

[5] Mechanisms for Integrated Feature Normalization and Remaining Useful Life Estimation Using LSTMs Applied to Hard-Disks [J].

Basak, Sanchita ;

Sengupta, Saptarshi ;

Dubey, Abhishek .

2019 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2019), 2019, :208-216

[6] Speeded-Up Robust Features (SURF) [J].

Bay, Herbert ;

Ess, Andreas ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) :346-359

[7]

Biggio B, 2013, MACHINE LEARNING KNO, DOI DOI 10.1007/978-3-642-40994-3_25

[8] Classifying environmental sounds using image recognition networks [J].

Boddapati, Venkatesh ;

Petef, Andrej ;

Rasmusson, Jim ;

Lundberg, Lars .

KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 :2048-2056

[9] Towards Evaluating the Robustness of Neural Networks [J].

Carlini, Nicholas ;

Wagner, David .

2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57

[10] Audio Adversarial Examples: Targeted Attacks on Speech-to-Text [J].

Carlini, Nicholas ;

Wagner, David .

2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, :1-7

← 1 2 3 4 5 →