Voice Deepfake Detection Using the Self-Supervised Pre-Training Model HuBERT

被引:3
|
作者
Li, Lanting [1 ]
Lu, Tianliang [1 ]
Ma, Xingbang [1 ]
Yuan, Mengjiao [1 ]
Wan, Da [1 ]
机构
[1] Peoples Publ Secur Univ China, Coll Informat & Cyber Secur, Beijing 100038, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 14期
关键词
voice deepfake detection; self-supervised learning; pre-training; feature map scaling; anti-spoofing;
D O I
10.3390/app13148488
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In recent years, voice deepfake technology has developed rapidly, but current detection methods have the problems of insufficient detection generalization and insufficient feature extraction for unknown attacks. This paper presents a forged speech detection method (HuRawNet2_modified) based on a self-supervised pre-trained model (HuBERT) to improve detection (and address the above problems). A combination of impulsive signal-dependent additive noise and additive white Gaussian noise was adopted for data boosting and augmentation, and the HuBERT model was fine-tuned on different language databases. On this basis, the size of the extracted feature maps was modified independently by the & alpha;-feature map scaling (& alpha;-FMS) method, with a modified end-to-end method using the RawNet2 model as the backbone structure. The results showed that the HuBERT model could extract features more comprehensively and accurately. The best evaluation indicators were an equal error rate (EER) of 2.89% and a minimum tandem detection cost function (min t-DCF) of 0.2182 on the database of the ASVspoof2021 LA challenge, which verified the effectiveness of the detection method proposed in this paper. Compared with the baseline systems in databases of the ASVspoof 2021 LA challenge and the FMFCC-A, the values of EER and min t-DCF decreased. The results also showed that the self-supervised pre-trained model with fine-tuning can extract acoustic features across languages. And the detection can be slightly improved when the languages of the pre-trained database, and the fine-tuned and tested database are the same.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Self-Supervised Pre-Training for 3-D Roof Reconstruction on LiDAR Data
    Yang, Hongxin
    Huang, Shangfeng
    Wang, Ruisheng
    Wang, Xin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [42] Self-supervised multimodal reconstruction pre-training for retinal computer-aided diagnosis
    Hervella, Alvaro S.
    Rouco, Jose
    Novo, Jorge
    Ortega, Marcos
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 185
  • [43] LPCL: Localized prominence contrastive learning for self-supervised dense visual pre-training
    Chen, Zihan
    Zhu, Hongyuan
    Cheng, Hao
    Mi, Siya
    Zhang, Yu
    Geng, Xin
    PATTERN RECOGNITION, 2023, 135
  • [44] A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision
    Li, Lanxiao
    Heizmann, Michael
    COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 656 - 673
  • [45] Incorporation of Iterative Self-supervised Pre-training in the Creation of the ASR System for the Tatar Language
    Khusainov, Aidar
    Suleymanov, Dzhavdet
    Muhametzyanov, Ilnur
    TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 481 - 488
  • [46] Self-supervised pseudo multi-class pre-training for unsupervised anomaly detection and segmentation in medical images
    Tian, Yu
    Liu, Fengbei
    Pang, Guansong
    Chen, Yuanhong
    Liu, Yuyuan
    Verjans, Johan W.
    Singh, Rajvinder
    Carneiro, Gustavo
    MEDICAL IMAGE ANALYSIS, 2023, 90
  • [47] Improving generalization through self-supervised learning using generative pre-training transformer for natural gas segmentation
    Santos, Luiz Fernando Trindade
    Gattass, Marcelo
    Rodriguez, Carlos
    Hurtado, Jan
    Miranda, Frederico
    Michelon, Diogo
    Ribeiro, Roberto
    COMPUTERS & GEOSCIENCES, 2025, 196
  • [48] S3T: SELF-SUPERVISED PRE-TRAINING WITH SWIN TRANSFORMER FOR MUSIC CLASSIFICATION
    Zhao, Hang
    Zhang, Chen
    Zhu, Bilei
    Ma, Zejun
    Zhang, Kejun
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 606 - 610
  • [49] Self-supervised Pre-training with Learnable Tokenizers for Person Re-Identification in Railway Stations
    Yang, Enze
    Li, Chao
    Liu, Shuoyan
    Liu, Yuxin
    Zhao, Shitao
    Huang, Nan
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 325 - 330
  • [50] SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech
    Lin, Jingru
    Ge, Meng
    Ao, Junyi
    Deng, Liqun
    Li, Haizhou
    INTERSPEECH 2024, 2024, : 597 - 601