CHARACTERIZING THE ADVERSARIAL VULNERABILITY OF SPEECH SELF-SUPERVISED LEARNING

被引:4
作者
Wu, Haibin [1 ,2 ]
Zheng, Bo [2 ,3 ]
Li, Xu [3 ]
Wu, Xixin [2 ,3 ]
Lee, Hung-Yi [1 ]
Meng, Helen [2 ,3 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
[2] Chinese Univ Hong Kong, Ctr Perceptual & Interact Intelligence, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Human Comp Commun Lab, Hong Kong, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
Adversarial attack; self-supervised learning;
D O I
10.1109/ICASSP43922.2022.9747242
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A leaderboard named Speech processing Universal PERformance Benchmark (SUPERB), which aims at benchmarking the performance of a shared self-supervised learning (SSL) speech model across various downstream speech tasks with minimal modification of architectures and a small amount of data, has fueled the research for speech representation learning. The SUPERB demonstrates speech SSL upstream models improve the performance of various downstream tasks through just minimal adaptation. As the paradigm of the self-supervised learning upstream model followed by downstream tasks arouses more attention in the speech community, characterizing the adversarial robustness of such paradigm is of high priority. In this paper, we make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries. The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries, and the attacks generated by zero-knowledge adversaries are with transferability. The XAB test verifies the imperceptibility of crafted adversarial attacks.
引用
收藏
页码:3164 / 3168
页数:5
相关论文
共 38 条
  • [1] [Anonymous], 2021, SLT 2021, DOI DOI 10.1109/SLT48900.2021.9383529
  • [2] Baevski Alexei, 2020, Advances in neural information processing systems
  • [3] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [4] Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
    Carlini, Nicholas
    Wagner, David
    [J]. 2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, : 1 - 7
  • [5] Cosentino J., 2020, ARXIV200511262
  • [6] Devlin J., 2019, North American Chapter of the Association for Computational Linguistics, V1, P4171, DOI [DOI 10.48550/ARXIV.1810.04805, DOI 10.18653/V1/N19-1423, 10.48550/ARXIV.1810.04805]
  • [7] End-to-End Neural Speaker Diarization with Permutation-Free Objectives
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. INTERSPEECH 2019, 2019, : 4300 - 4304
  • [8] HUBERT: HOW MUCH CAN A BAD TEACHER BENEFIT ASR PRE-TRAINING?
    Hsu, Wei-Ning
    Tsai, Yao-Hung Hubert
    Bolte, Benjamin
    Salakhutdinov, Ruslan
    Mohamed, Abdelrahman
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6533 - 6537
  • [9] Kassis Andre, 2021, ARXIV210714642
  • [10] Kreuk F, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P1962, DOI 10.1109/ICASSP.2018.8462693