Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features

被引:4
作者
Barrington, Sarah [1 ]
Barua, Romit [1 ]
Koorma, Gautham [1 ]
Farid, Hany [1 ,2 ]
机构
[1] Univ Calif Berkeley, Sch Informat, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Elect Engn & Comp Sci, Berkeley, CA USA
来源
2023 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY, WIFS | 2023年
关键词
deepfakes; generative AI; audio forensics;
D O I
10.1109/WIFS58808.2023.10374911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Synthetic-voice cloning technologies have seen significant advances in recent years, giving rise to a range of potential harms. From small- and large-scale financial fraud to disinformation campaigns, the need for reliable methods to differentiate real and synthesized voices is imperative. We describe three techniques for differentiating a real from a cloned voice designed to impersonate a specific person. These three approaches differ in their feature extraction stage with low-dimensional perceptual features offering high interpretability but lower accuracy, to generic spectral features, and end-to-end learned features offering less interpretability but higher accuracy. We show the efficacy of these approaches when trained on a single speaker's voice and when trained on multiple voices. The learned features consistently yield an equal error rate between 0% and 4%, and are reasonably robust to adversarial laundering.
引用
收藏
页数:6
相关论文
共 27 条
[1]  
Al Badawy E. A., 2019, CVPR WORKSHOPS, P104
[2]   A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions [J].
Almutairi, Zaynab ;
Elgibreen, Hebah .
ALGORITHMS, 2022, 15 (05)
[3]  
Attorresi L., 2022, arXiv
[4]  
Blue L, 2022, PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, P2691
[5]  
Bohácek M, 2022, P NATL ACAD SCI USA, V119, DOI [10.1073/pnas.2107266119, 10.1073/pnas.2216035119]
[6]  
Casanova E, 2022, PR MACH LEARN RES
[7]  
Cox Joseph., How I broke into a bank account with an AI-generated voice
[8]  
De Leon PL, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P370
[9]  
Eyben F., 2010, P 18 ACM INT C MULT, P1459
[10]  
Frank J, 2021, Arxiv, DOI arXiv:2111.02813