ASSMark: Dual Defense Against Speech Synthesis Attack via Adversarial Robust Watermarking

被引：0

作者：

He, Yulin ^{[1
]}

Wang, Hongxia ^{[1
]}

Qiu, Yiqin ^{[1
]}

Cao, Hao ^{[1
]}

机构：

[1] Sichuan Univ, Sch Cyber Sci & Engn, Chengdu 610065, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2025年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Watermarking; Speech synthesis; Training; Perturbation methods; Noise; Decoding; Closed box; Robustness; Protection; Computational modeling; Audio watermarking; adversarial attack; speech synthesis; copyright protection; LEVEL;

D O I：

10.1109/LSP.2025.3562817

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Given the widespread dissemination of digital audio and the advancements in speech synthesis technologies, protecting audio copyright has become a critical issue. Although watermarks play an important role in copyright verification and forensic analysis, they are insufficient to proactively defend against malicious speech synthesis. To address this issue, we introduce a novel adversarial speech synthesis watermarking mechanism (ASSMark), which simultaneously traces the audio copyright and disrupts the speech synthesis models by embedding robust adversarial watermarks in a one-time manner. Specifically, we design a unified training framework that models the embedding of watermarks and adversarial perturbations as collaborative tasks. This approach allows for the fine-tuning of any robust watermark into an adversarial watermark, resulting in watermarked audio that can effectively defend against unauthorized speech synthesis attacks. Experimental results demonstrate that ASSMark achieves over 90% protection rate even to unknown black-box models. Compared to simplistic two-step protection methods, it not only effectively resists synthesis attacks but also achieves superior watermark extraction accuracy and speech quality, offering an outstanding solution for protecting audio copyright.

引用

页码：1870 / 1874

页数：5

共 30 条

[1]

Betker J, 2023, Arxiv, DOI arXiv:2305.07243

[2]

Casanova E, 2022, PR MACH LEARN RES

[3]

Chen G., 2025, P NETW DISTR SYST SE, P1

[4]

Chen GY, 2024, Arxiv, DOI arXiv:2308.12770

[5] One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization [J].

Chou, Ju-chieh ;

Lee, Hung-Yi .

INTERSPEECH 2019, 2019, :664-668

[6] Active Defense Against Voice Conversion Through Generative Adversarial Network [J].

Dong, Shihang ;

Chen, Beijing ;

Ma, Kaijie ;

Zhao, Guoying .

IEEE SIGNAL PROCESSING LETTERS, 2024, 31 :706-710

[7]

Garofolo J. S., 1993, NASA STI/Recon technical report n, V93

[8] DEFENDING YOUR VOICE: ADVERSARIAL ATTACK ON VOICE CONVERSION [J].

Huang, Chien-yu ;

Lin, Yist Y. ;

Lee, Hung-yi ;

Lee, Lin-shan .

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, :552-559

[9]

Jia Y, 2018, ADV NEUR IN, V31

[10]

Liu C., 2024, P NETW DISTR SYST SE, P1

← 1 2 3 →