Comparative study of data augmentation methods for fake audio detection

被引:1
作者
Park, KwanYeol [1 ]
Kwak, Il-Youp [1 ,2 ]
机构
[1] Chung Ang Univ, Dept Appl Stat, Seoul, South Korea
[2] Chung Ang Univ, 84 Heukseok Ro, Seoul 06911, South Korea
基金
新加坡国家研究基金会;
关键词
data augmentation; occlusion; deep learning;
D O I
10.5351/KJAS.2023.36.2.101
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.
引用
收藏
页码:101 / 114
页数:14
相关论文
共 36 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]   CALCULATION OF A CONSTANT-Q SPECTRAL TRANSFORM [J].
BROWN, JC .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1991, 89 (01) :425-434
[3]  
Chapelle O., 2000, Advances in neural information processing systems, V13
[4]  
Cheng XL, 2019, ASIAPAC SIGN INFO PR, P540, DOI [10.1109/APSIPAASC47483.2019.9023158, 10.1109/apsipaasc47483.2019.9023158]
[5]   Data augmentation in voice spoofing problem [J].
Choi, Hyo-Jung ;
Kwak, Il-Youp .
KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (03) :449-460
[6]  
Delgado H, 2017, OD 2018 SPEAK LANG R
[7]  
DeVries T., 2017, ARXIV
[8]   LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems [J].
Dua, Mohit ;
Jain, Chhavi ;
Kumar, Sushil .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 13 (04) :1985-2000
[9]   Occlusions for Effective Data Augmentation in Image Classification [J].
Fong, Ruth C. ;
Vedaldi, Andrea .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :4158-4166
[10]  
Goodfellow I., 2013, JMLR WORKSHOP C P, P1319