Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise

被引:0
作者
Nielsen, Christian Heider [1 ]
Tan, Zheng-Hua [1 ,2 ]
机构
[1] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark
[2] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark
来源
IEEE OPEN JOURNAL OF SIGNAL PROCESSING | 2023年 / 4卷
关键词
Filter banks; Glass box; Feature extraction; Closed box; Spectrogram; Mel frequency cepstral coefficient; High frequency; Adversarial examples; automatic speech recognition; deep learning; filter bank; noise robustness;
D O I
10.1109/OJSP.2023.3256321
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, significant progress has been made in deep model-based automatic speech recognition (ASR), leading to its widespread deployment in the real world. At the same time, adversarial attacks against deep ASR systems are highly successful. Various methods have been proposed to defend ASR systems from these attacks. However, existing classification based methods focus on the design of deep learning models while lacking exploration of domain specific features. This work leverages filter bank-based features to better capture the characteristics of attacks for improved detection. Furthermore, the paper analyses the potentials of using speech and non-speech parts separately in detecting adversarial attacks. In the end, considering adverse environments where ASR systems may be deployed, we study the impact of acoustic noise of various types and signal-to-noise ratios. Extensive experiments show that the inverse filter bank features generally perform better in both clean and noisy environments, the detection is effective using either speech or non-speech part, and the acoustic noise can largely degrade the detection performance.
引用
收藏
页码:179 / 187
页数:9
相关论文
共 32 条
[1]  
Akinwande V, 2020, Arxiv, DOI arXiv:2002.05463
[2]  
Alzantot M., 2017, PROC NEURAL INF PROC
[3]  
Ardila R, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4218
[4]  
Battenberg E, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P206, DOI 10.1109/ASRU.2017.8268937
[5]   Audio Adversarial Examples: Targeted Attacks on Speech-to-Text [J].
Carlini, Nicholas ;
Wagner, David .
2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, :1-7
[6]  
Cisse M, 2017, Arxiv, DOI arXiv:1707.05373
[7]   ADAGIO: Interactive Experimentation with Adversarial Attack and Defense for Audio [J].
Das, Nilaksh ;
Shanbhogue, Madhuri ;
Chen, Shang-Tse ;
Chen, Li ;
Kounavis, Michael E. ;
Chau, Duen Horng .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT III, 2019, 11053 :677-681
[8]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[9]   Adversarial Examples for Automatic Speech Recognition: Attacks and Countermeasures [J].
Hu, Shengshan ;
Shang, Xingcan ;
Qin, Zhan ;
Li, Minghui ;
Wang, Qian ;
Wang, Cong .
IEEE COMMUNICATIONS MAGAZINE, 2019, 57 (10) :120-126
[10]  
Huang Acero., 2001, Spoken language processing: A guide to theory, algorithm, and system development