Acoustic backdoor attacks on speech recognition via frequency offset perturbation

被引：0

作者：

Tang, Yu ^{[1
,3
]}

Xu, Xiaolong ^{[2
]}

Sun, Lijuan ^{[2
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, 9 Wenyuan Rd, Nanjing 210023, Jiangsu, Peoples R China

[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci, 9 Wenyuan Rd, Nanjing 210023, Jiangsu, Peoples R China

[3] Huaiyin Normal Univ, Sch Comp Sci & Technol, Huaian 223300, Jiangsu, Peoples R China

来源：

APPLIED SOFT COMPUTING | 2025年 / 177卷

基金：

中国国家自然科学基金;

关键词：

Speech recognition; Backdoor attack; Frequency offset perturbation; Deep learning; Audio domain security;

D O I：

10.1016/j.asoc.2025.113188

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the increasing deployment of deep learning-based speech recognition systems, backdoor attacks have become a serious security threat, enabling adversaries to implant hidden triggers that activate malicious behaviors while preserving model performance on benign inputs. However, existing acoustic backdoor attacks, whether in the time or frequency domain, often struggle to achieve sufficient stealthiness, as poisoned samples either disrupt semantic integrity or introduce perceptible artifacts. Moreover, these methods typically fail to strike an effective balance among attack efficacy, stealthiness, and robustness. To address these limitations, we propose Shadow Frequency (SF), a novel backdoor attack that leverages psychoacoustic-guided frequency offset perturbations to inject imperceptible yet model-sensitive signals near dominant spectral components. This design ensures auditory imperceptibility while maintaining high attack effectiveness and robustness. Experimental results show that SF achieves over 96% ASR with minimal impact on clean data accuracy, and remains effective under common defenses, validating its practicality for real-world deployment.

引用

页数：21

共 82 条

[1] Convolutional Neural Networks for Speech Recognition [J].

Abdel-Hamid, Ossama ;

Mohamed, Abdel-Rahman ;

Jiang, Hui ;

Deng, Li ;

Penn, Gerald ;

Yu, Dong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545

[2] SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems [J].

Abdullah, Hadi ;

Warren, Kevin ;

Bindschaedler, Vincent ;

Papernot, Nicolas ;

Traynor, Patrick .

2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2021, :730-747

[3]

Aghakhani H., 2020, Comput. Res. Repos.

[4] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].

ALLEN, JB ;

BERKLEY, DA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950

[5]

Alzantot M., 2021, IEEE Trans. Inf. Forensics Secur., V16, P3726

[6]

Ardila R, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4218

[7]

B. Series, 2014, Int. Telecommun. Union Radiocommun. Assem., V2

[8]

Berg A., 2021, arXiv, DOI [10.21437/Interspeech.2021-1286, DOI 10.21437/INTERSPEECH.2021-1286]

[9] Keyword Transformer: A Self-Attention Model for Keyword Spotting [J].

Berg, Axel ;

O'Connor, Mark ;

Cruz, Miguel Tairum .

INTERSPEECH 2021, 2021, :4249-4253

[10]

Bergstra J., 2011, 24 INT C NEUR INF PR, V24

← 1 2 3 4 5 6 7 8 9 →