ASSMark: Dual Defense Against Speech Synthesis Attack via Adversarial Robust Watermarking

被引:0
作者
He, Yulin [1 ]
Wang, Hongxia [1 ]
Qiu, Yiqin [1 ]
Cao, Hao [1 ]
机构
[1] Sichuan Univ, Sch Cyber Sci & Engn, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Watermarking; Speech synthesis; Training; Perturbation methods; Noise; Decoding; Closed box; Robustness; Protection; Computational modeling; Audio watermarking; adversarial attack; speech synthesis; copyright protection; LEVEL;
D O I
10.1109/LSP.2025.3562817
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Given the widespread dissemination of digital audio and the advancements in speech synthesis technologies, protecting audio copyright has become a critical issue. Although watermarks play an important role in copyright verification and forensic analysis, they are insufficient to proactively defend against malicious speech synthesis. To address this issue, we introduce a novel adversarial speech synthesis watermarking mechanism (ASSMark), which simultaneously traces the audio copyright and disrupts the speech synthesis models by embedding robust adversarial watermarks in a one-time manner. Specifically, we design a unified training framework that models the embedding of watermarks and adversarial perturbations as collaborative tasks. This approach allows for the fine-tuning of any robust watermark into an adversarial watermark, resulting in watermarked audio that can effectively defend against unauthorized speech synthesis attacks. Experimental results demonstrate that ASSMark achieves over 90% protection rate even to unknown black-box models. Compared to simplistic two-step protection methods, it not only effectively resists synthesis attacks but also achieves superior watermark extraction accuracy and speech quality, offering an outstanding solution for protecting audio copyright.
引用
收藏
页码:1870 / 1874
页数:5
相关论文
共 30 条
[1]  
Betker J, 2023, Arxiv, DOI arXiv:2305.07243
[2]  
Casanova E, 2022, PR MACH LEARN RES
[3]  
Chen G., 2025, P NETW DISTR SYST SE, P1
[4]  
Chen GY, 2024, Arxiv, DOI arXiv:2308.12770
[5]   One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization [J].
Chou, Ju-chieh ;
Lee, Hung-Yi .
INTERSPEECH 2019, 2019, :664-668
[6]   Active Defense Against Voice Conversion Through Generative Adversarial Network [J].
Dong, Shihang ;
Chen, Beijing ;
Ma, Kaijie ;
Zhao, Guoying .
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 :706-710
[7]  
Garofolo J. S., 1993, NASA STI/Recon technical report n, V93
[8]   DEFENDING YOUR VOICE: ADVERSARIAL ATTACK ON VOICE CONVERSION [J].
Huang, Chien-yu ;
Lin, Yist Y. ;
Lee, Hung-yi ;
Lee, Lin-shan .
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, :552-559
[9]  
Jia Y, 2018, ADV NEUR IN, V31
[10]  
Liu C., 2024, P NETW DISTR SYST SE, P1