Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features

被引：4

作者：

Barrington, Sarah ^{[1
]}

Barua, Romit ^{[1
]}

Koorma, Gautham ^{[1
]}

Farid, Hany ^{[1
,2
]}

机构：

[1] Univ Calif Berkeley, Sch Informat, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, Elect Engn & Comp Sci, Berkeley, CA USA

来源：

2023 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY, WIFS | 2023年

关键词：

deepfakes; generative AI; audio forensics;

D O I：

10.1109/WIFS58808.2023.10374911

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Synthetic-voice cloning technologies have seen significant advances in recent years, giving rise to a range of potential harms. From small- and large-scale financial fraud to disinformation campaigns, the need for reliable methods to differentiate real and synthesized voices is imperative. We describe three techniques for differentiating a real from a cloned voice designed to impersonate a specific person. These three approaches differ in their feature extraction stage with low-dimensional perceptual features offering high interpretability but lower accuracy, to generic spectral features, and end-to-end learned features offering less interpretability but higher accuracy. We show the efficacy of these approaches when trained on a single speaker's voice and when trained on multiple voices. The learned features consistently yield an equal error rate between 0% and 4%, and are reasonably robust to adversarial laundering.

引用

页数：6

共 27 条

[1]

Al Badawy E. A., 2019, CVPR WORKSHOPS, P104

[2] A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions [J].

Almutairi, Zaynab ;

Elgibreen, Hebah .

ALGORITHMS, 2022, 15 (05)

[3]

Attorresi L., 2022, arXiv

[4]

Blue L, 2022, PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, P2691

[5]

Bohácek M, 2022, P NATL ACAD SCI USA, V119, DOI [10.1073/pnas.2107266119, 10.1073/pnas.2216035119]

[6]

Casanova E, 2022, PR MACH LEARN RES

[7]

Cox Joseph., How I broke into a bank account with an AI-generated voice

[8]

De Leon PL, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P370

[9]

Eyben F., 2010, P 18 ACM INT C MULT, P1459

[10]

Frank J, 2021, Arxiv, DOI arXiv:2111.02813

← 1 2 3 →