Comparative study of data augmentation methods for fake audio detection

被引：1

作者：

Park, KwanYeol ^{[1
]}

Kwak, Il-Youp ^{[1
,2
]}

机构：

[1] Chung Ang Univ, Dept Appl Stat, Seoul, South Korea

[2] Chung Ang Univ, 84 Heukseok Ro, Seoul 06911, South Korea

来源：

KOREAN JOURNAL OF APPLIED STATISTICS | 2023年 / 36卷 / 02期

基金：

新加坡国家研究基金会;

关键词：

data augmentation; occlusion; deep learning;

D O I：

10.5351/KJAS.2023.36.2.101

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

引用

页码：101 / 114

页数：14

共 36 条

[1] Convolutional Neural Networks for Speech Recognition [J].