Voice Deepfake Detection Using the Self-Supervised Pre-Training Model HuBERT

被引：3

作者：

Li, Lanting ^{[1
]}

Lu, Tianliang ^{[1
]}

Ma, Xingbang ^{[1
]}

Yuan, Mengjiao ^{[1
]}

Wan, Da ^{[1
]}

机构：

[1] Peoples Publ Secur Univ China, Coll Informat & Cyber Secur, Beijing 100038, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 14期

关键词：

voice deepfake detection; self-supervised learning; pre-training; feature map scaling; anti-spoofing;

D O I：

10.3390/app13148488

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

In recent years, voice deepfake technology has developed rapidly, but current detection methods have the problems of insufficient detection generalization and insufficient feature extraction for unknown attacks. This paper presents a forged speech detection method (HuRawNet2_modified) based on a self-supervised pre-trained model (HuBERT) to improve detection (and address the above problems). A combination of impulsive signal-dependent additive noise and additive white Gaussian noise was adopted for data boosting and augmentation, and the HuBERT model was fine-tuned on different language databases. On this basis, the size of the extracted feature maps was modified independently by the & alpha;-feature map scaling (& alpha;-FMS) method, with a modified end-to-end method using the RawNet2 model as the backbone structure. The results showed that the HuBERT model could extract features more comprehensively and accurately. The best evaluation indicators were an equal error rate (EER) of 2.89% and a minimum tandem detection cost function (min t-DCF) of 0.2182 on the database of the ASVspoof2021 LA challenge, which verified the effectiveness of the detection method proposed in this paper. Compared with the baseline systems in databases of the ASVspoof 2021 LA challenge and the FMFCC-A, the values of EER and min t-DCF decreased. The results also showed that the self-supervised pre-trained model with fine-tuning can extract acoustic features across languages. And the detection can be slightly improved when the languages of the pre-trained database, and the fine-tuned and tested database are the same.

引用

页数：15

共 50 条

[41] Self-Supervised Pre-Training for 3-D Roof Reconstruction on LiDAR Data
Yang, Hongxin
Huang, Shangfeng
Wang, Ruisheng
Wang, Xin
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[42] Self-supervised multimodal reconstruction pre-training for retinal computer-aided diagnosis
Hervella, Alvaro S.
Rouco, Jose
Novo, Jorge
Ortega, Marcos
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 185
[43] LPCL: Localized prominence contrastive learning for self-supervised dense visual pre-training
Chen, Zihan
Zhu, Hongyuan
Cheng, Hao
Mi, Siya
Zhang, Yu
Geng, Xin
PATTERN RECOGNITION, 2023, 135
[44] A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision
Li, Lanxiao
Heizmann, Michael
COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 656 - 673
[45] Incorporation of Iterative Self-supervised Pre-training in the Creation of the ASR System for the Tatar Language
Khusainov, Aidar
Suleymanov, Dzhavdet
Muhametzyanov, Ilnur
TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 481 - 488
[46] Self-supervised pseudo multi-class pre-training for unsupervised anomaly detection and segmentation in medical images
Tian, Yu
Liu, Fengbei
Pang, Guansong
Chen, Yuanhong
Liu, Yuyuan
Verjans, Johan W.
Singh, Rajvinder
Carneiro, Gustavo
MEDICAL IMAGE ANALYSIS, 2023, 90
[47] Improving generalization through self-supervised learning using generative pre-training transformer for natural gas segmentation
Santos, Luiz Fernando Trindade
Gattass, Marcelo
Rodriguez, Carlos
Hurtado, Jan
Miranda, Frederico
Michelon, Diogo
Ribeiro, Roberto
COMPUTERS & GEOSCIENCES, 2025, 196
[48] S3T: SELF-SUPERVISED PRE-TRAINING WITH SWIN TRANSFORMER FOR MUSIC CLASSIFICATION
Zhao, Hang
Zhang, Chen
Zhu, Bilei
Ma, Zejun
Zhang, Kejun
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 606 - 610
[49] Self-supervised Pre-training with Learnable Tokenizers for Person Re-Identification in Railway Stations
Yang, Enze
Li, Chao
Liu, Shuoyan
Liu, Yuxin
Zhao, Shitao
Huang, Nan
2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 325 - 330
[50] SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech
Lin, Jingru
Ge, Meng
Ao, Junyi
Deng, Liqun
Li, Haizhou
INTERSPEECH 2024, 2024, : 597 - 601

← 1 2 3 4 5 →