Synthesising Audio Adversarial Examples for Automatic Speech Recognition

被引:4
作者
Qu, Xinghua [1 ]
Wei, Pengfei [1 ]
Gao, Mingyong [2 ]
Sun, Zhu [3 ]
Ong, Yew-Soon [4 ]
Ma, Zejun [1 ]
机构
[1] Bytedance AI Lab, Speech & Audio Team, Singapore, Singapore
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] ASTAR, Inst High Performance Comp & Ctr Frontier AI Res, Singapore, Singapore
[4] Nanyang Technol Univ, A STAR, Ctr Frontier AI Res, Singapore, Singapore
来源
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年
关键词
Adversarial Attack; Automatic Speech Recognition; Speech Synthesis;
D O I
10.1145/3534678.3539268
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Adversarial examples in automatic speech recognition (ASR) are naturally sounded by humans yet capable of fooling well trained ASR models to transcribe incorrectly. Existing audio adversarial examples are typically constructed by adding constrained perturbations on benign audio inputs. Such attacks are therefore generated with an audio dependent assumption. For the first time, we propose the Speech Synthesising based Attack (SSA), a novel threat model that constructs audio adversarial examples entirely from scratch, i.e., without depending on any existing audio to fool cutting-edge ASR models. To this end, we introduce a conditional variational auto-encoder (CVAE) as the speech synthesiser. Meanwhile, an adaptive sign gradient descent algorithm is proposed to solve the adversarial audio synthesis task. Experiments on three datasets (i.e., Audio Mnist, Common Voice, and Librispeech) show that our method could synthesise naturally sounded audio adversarial examples to mislead the start-of-the-art ASR models. Our web-page containing generated audio demos is at https://sites.google.com/view/ssa-asr/home.
引用
收藏
页码:1430 / 1440
页数:11
相关论文
共 43 条
  • [1] Amodei D, 2016, PR MACH LEARN RES, V48
  • [2] Ardila R., 2019, ARXIV191206670
  • [3] Balles L., 2018, P MACHINE LEARNING R, V80, P404
  • [4] Becker Soren, 2018, INTERPRETING EXPLAIN
  • [5] SIMULATED ANNEALING
    BERTSIMAS, D
    TSITSIKLIS, J
    [J]. STATISTICAL SCIENCE, 1993, 8 (01) : 10 - 15
  • [6] Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
    Carlini, Nicholas
    Wagner, David
    [J]. 2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, : 1 - 7
  • [7] Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
  • [8] Chen T., 2020, NETW DISTR SYST SEC
  • [9] Croce F, 2020, PR MACH LEARN RES, V119
  • [10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848