Speech Audio Deepfake Detection via Convolutional Neural Networks

被引:0
作者
Valente, Lucas P. [1 ]
de Souza, Marcelo M. S. [1 ]
da Rocha, Alan M. [1 ]
机构
[1] Univ Fed Ceara, PPGEEC, Campus Sobral, Sobral, Brazil
来源
IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS 2024, IEEE EAIS 2024 | 2024年
关键词
Forensics analysis; Deepfake; Voice cloning; Computer vision; Machine learning; Deep learning;
D O I
10.1109/EAIS58494.2024.10569111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The production of artificial media content brings on ethical, legal and social implications for journalism, education, entertainment and industry. Software tools are currently available for anyone who intent to maliciously generate or tamper with digital audio voices. In this context, detecting voice authenticity is important to avoid the consequences of its criminal use. Here, we propose the application of convolutional neural networks (CNN) and Mel spectograms in detection of artificially generated voices. Supervised experiments with speech samples signals, collected from several voice datasets, were conducted to find the best CNN topology that performs the detection, in terms of accuracy, regardless of the language spoken. The best accuracy scores found are: 99% for the FoR dataset, 94% for the ASV and 98% for the WaveFake. Training the model with all datasets together, and testing with individual datasets, yields accuracies of 98% for the FoR base, 92% for the ASV and 96% for WaveFake. These results are compatible with those found in state-of-the-art, proving the viability of the model.
引用
收藏
页码:382 / 387
页数:6
相关论文
共 27 条
[1]  
Amezaga Naroa, 2022, SIGITE '22: The 23rd Annual Conference on Information Technology Education, P23, DOI 10.1145/3537674.3554742
[2]  
[Anonymous], 2017, VoxForge Dataset
[3]  
Badlani R, 2021, Arxiv, DOI arXiv:2108.10447
[4]  
Barrington R., 2022, Single and Multi-Speaker Cloned Voice Detection: From perceptual to learned features
[5]  
Bartusiak E. J., 2021, Frequency domain-based detection of generated audio
[6]  
Black Alan W., 2017, Arctic Dataset
[7]  
Cho KYHY, 2014, Arxiv, DOI [arXiv:1406.1078, DOI 10.48550/ARXIV.1406.1078]
[8]  
Frank Joel, 2021, Zenodo, DOI 10.5281/ZENODO.5642694
[9]  
Ito K., 2017, LJ SPEECH DATASET
[10]  
Jia Ye, 2019, ARXIV