Deep4SNet: deep learning for fake speech classification

被引:36
作者
Ballesteros, M. Dora [1 ]
Rodriguez-Ortega, Yohanna [1 ]
Renza, Diego [1 ]
Arce, Gonzalo [2 ]
机构
[1] Univ Militar Nueva Granada, Cra 11 101-80, Bogota 110111, Colombia
[2] Univ Delaware, 210 South Coll Ave, Newark, DE 19716 USA
关键词
Fake voice; Convolutional neural network; Imitation; Deep learning; Deep voice; Classification; SPEAKER VERIFICATION; SPECTROGRAM;
D O I
10.1016/j.eswa.2021.115465
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fake speech consists on voice recordings created even by artificial intelligence or signal processing techniques. Among the methods for generating false voice recordings are Deep Voice and Imitation. In Deep voice, the recordings sound slightly synthesized, whereas in Imitation, they sound natural. On the other hand, the task of detecting fake content is not trivial considering the large number of voice recordings that are transmitted over the Internet. In order to detect fake voice recordings obtained by Deep Voice and Imitation, we propose a solution based on a Convolutional Neural Network (CNN), using image augmentation and dropout. The proposed architecture was trained with 2092 histograms of both original and fake voice recordings and cross-validated with 864 histograms. 476 new histograms were used for external validation, and Precision (P) and Recall (R) were calculated. Detection of fake audios reached P = 0.997, R = 0.997 for Imitation-based recordings, and P = 0.985, R = 0.944 for Deep Voice-based recordings. The global accuracy was 0.985. According to the results, the proposed system is successful in detecting fake voice content.
引用
收藏
页数:12
相关论文
共 29 条
[1]  
Arik S. 0., 2017, PR MACH LEARN RES, P195
[2]   A dataset of histograms of original and fake voice recordings (H -Voice) [J].
Ballesteros, Dora M. ;
Rodriguez, Yohanna ;
Renza, Diego .
DATA IN BRIEF, 2020, 29
[3]   On the ability of adaptation of speech signals and data hiding [J].
Ballesteros L, Dora M. ;
Moreno A, Juan M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (16) :12574-12579
[4]   Highly transparent steganography model of speech signals using Efficient Wavelet Masking [J].
Ballesteros L, Dora M. ;
Moreno A, Juan M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) :9141-9149
[5]  
Bunrit Supaporn, 2019, International Journal of Machine Learning and Computing, V9, P143, DOI 10.18178/ijmlc.2019.9.2.778
[6]   Using Kernel Discriminant Analysis to Improve the Characterization of the Alternative Hypothesis for Speaker Verification [J].
Chao, Yi-Hsiang ;
Tsai, Wei-Ho ;
Wang, Hsin-Min ;
Chang, Ruei-Chuan .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (08) :1675-1684
[7]   Using LR-based discriminant kernel methods with applications to speaker verification [J].
Chao, Yi-Hsiang .
SPEECH COMMUNICATION, 2014, 57 :76-86
[8]   Deep Nonlinear Metric Learning for Speaker Verification in the I-Vector Space [J].
Feng, Yong ;
Xiong, Qingyu ;
Shi, Weiren .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (01) :215-219
[9]  
Finkelstein A., 2017, ACM Transactions on Graphics (TOG)
[10]   Dual branch convolutional neural network for copy move forgery detection [J].
Goel, Nidhi ;
Kaur, Samarjeet ;
Bala, Ruchika .
IET IMAGE PROCESSING, 2021, 15 (03) :656-665