Deepfakes Audio Detection Leveraging Audio Spectrogram and Convolutional Neural Networks

被引:3
作者
Wani, Taiba Majid [1 ]
Amerini, Irene [1 ]
机构
[1] Sapienza Univ Rome, Rome, Italy
来源
IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II | 2023年 / 14234卷
关键词
Audio Deepfakes; FoR dataset; CNN; VGG16; MobileNet;
D O I
10.1007/978-3-031-43153-1_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The proliferation of algorithms and commercial tools for the creation of synthetic audio has resulted in a significant increase in the amount of inaccurate information, particularly on social media platforms. As a direct result of this, efforts have been concentrated in recent years on identifying the presence of content of this kind. Despite this, there is still a long way to go until this problem is adequately addressed because of the growing naturalness of fake or synthetic audios. In this study, we proposed different networks configurations: a Custom Convolution Neural Network (cCNN) and two pretrained models (VGG16 and MobileNet) as well as end-to-end models to classify real and fake audios. An extensive experimental analysis was carried out on three classes of audio manipulation of the dataset FoR deepfake audio dataset. Also, we combined such sub-datasets to formulate a combined dataset FoR-combined to enhance the performance of the models. The experimental analysis shows that the proposed cCNN outperforms all the baseline models and other reference works with the highest accuracy of 97.23% on FoR-combined and sets new benchmarks for the datasets.
引用
收藏
页码:156 / 167
页数:12
相关论文
共 18 条
[1]   Deepfakes Generation and Detection: A Short Survey [J].
Akhtar, Zahid .
JOURNAL OF IMAGING, 2023, 9 (01)
[2]  
Alabdulmohsin Ibrahim, 2021, arXiv
[3]   Secure Automatic Speaker Verification (SASV) System Through sm-ALTP Features and Asymmetric Bagging [J].
Aljasem, Muteb ;
Irtaza, Aun ;
Malik, Hafiz ;
Saba, Noushin ;
Javed, Ali ;
Malik, Khalid Mahmood ;
Meharmohammadi, Mohammad .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 :3524-3537
[4]   Fake Speech Recognition Using Deep Learning [J].
Camacho, Steven ;
Maria Ballesteros, Dora ;
Renza, Diego .
APPLIED COMPUTER SCIENCES IN ENGINEERING, WEA 2021, 2021, 1431 :38-48
[5]   Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors [J].
Firc, Anton ;
Malinka, Kamil ;
Hanacek, Petr .
HELIYON, 2023, 9 (04)
[6]  
Howard AG, 2017, Arxiv, DOI arXiv:1704.04861
[7]   Hybrid Feature Selection Method Based on Harmony Search and Naked Mole-Rat Algorithms for Spoken Language Identification From Audio Signals [J].
Guha, Samarpan ;
Das, Aankit ;
Singh, Pawan Kumar ;
Ahmadian, Ali ;
Senu, Norazak ;
Sarkar, Ram .
IEEE ACCESS, 2020, 8 (08) :182868-182887
[8]   Deepfake Audio Detection via MFCC Features Using Machine Learning [J].
Hamza, Ameer ;
Javed, Abdul Rehman ;
Iqbal, Farkhund ;
Kryvinska, Natalia ;
Almadhor, Ahmad S. S. ;
Jalil, Zunera ;
Borghol, Rouba .
IEEE ACCESS, 2022, 10 :134018-134028
[9]  
Iqbal F., 2022, Deepfake Audio Detection via Feature Engineering and Machine Learning
[10]   A survey of the recent architectures of deep convolutional neural networks [J].
Khan, Asifullah ;
Sohail, Anabia ;
Zahoora, Umme ;
Qureshi, Aqsa Saeed .
ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (08) :5455-5516