Voice Deepfake Detection Using the Self-Supervised Pre-Training Model HuBERT

被引:3
|
作者
Li, Lanting [1 ]
Lu, Tianliang [1 ]
Ma, Xingbang [1 ]
Yuan, Mengjiao [1 ]
Wan, Da [1 ]
机构
[1] Peoples Publ Secur Univ China, Coll Informat & Cyber Secur, Beijing 100038, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 14期
关键词
voice deepfake detection; self-supervised learning; pre-training; feature map scaling; anti-spoofing;
D O I
10.3390/app13148488
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In recent years, voice deepfake technology has developed rapidly, but current detection methods have the problems of insufficient detection generalization and insufficient feature extraction for unknown attacks. This paper presents a forged speech detection method (HuRawNet2_modified) based on a self-supervised pre-trained model (HuBERT) to improve detection (and address the above problems). A combination of impulsive signal-dependent additive noise and additive white Gaussian noise was adopted for data boosting and augmentation, and the HuBERT model was fine-tuned on different language databases. On this basis, the size of the extracted feature maps was modified independently by the & alpha;-feature map scaling (& alpha;-FMS) method, with a modified end-to-end method using the RawNet2 model as the backbone structure. The results showed that the HuBERT model could extract features more comprehensively and accurately. The best evaluation indicators were an equal error rate (EER) of 2.89% and a minimum tandem detection cost function (min t-DCF) of 0.2182 on the database of the ASVspoof2021 LA challenge, which verified the effectiveness of the detection method proposed in this paper. Compared with the baseline systems in databases of the ASVspoof 2021 LA challenge and the FMFCC-A, the values of EER and min t-DCF decreased. The results also showed that the self-supervised pre-trained model with fine-tuning can extract acoustic features across languages. And the detection can be slightly improved when the languages of the pre-trained database, and the fine-tuned and tested database are the same.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] FALL DETECTION USING SELF-SUPERVISED PRE-TRAINING MODEL
    Yhdego, Haben
    Audette, Michel
    Paolini, Christopher
    PROCEEDINGS OF THE 2022 ANNUAL MODELING AND SIMULATION CONFERENCE (ANNSIM'22), 2022, : 361 - 371
  • [2] Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute
    Chen, William
    Chang, Xuankai
    Peng, Yifan
    Ni, Zhaoheng
    Maiti, Soumi
    Watanabe, Shinji
    INTERSPEECH 2023, 2023, : 4404 - 4408
  • [3] Self-supervised Pre-training of Text Recognizers
    Kiss, Martin
    Hradis, Michal
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT IV, 2024, 14807 : 218 - 235
  • [4] Self-supervised ECG pre-training
    Liu, Han
    Zhao, Zhenbo
    She, Qiang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
  • [5] Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech
    Lin, Jingru
    Ge, Meng
    Wang, Wupeng
    Li, Haizhou
    Feng, Mengling
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1014 - 1018
  • [6] Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks
    Ma, Duo
    Yue, Xianghu
    Ao, Junyi
    Gao, Xiaoxue
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2055 - 2059
  • [7] Self-supervised Pre-training for Nuclei Segmentation
    Haq, Mohammad Minhazul
    Huang, Junzhou
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT II, 2022, 13432 : 303 - 313
  • [8] Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection
    Shim, Hye-jin
    Heo, Hee-Soo
    Jung, Jee-weon
    Yu, Ha-Jin
    INTERSPEECH 2020, 2020, : 1091 - 1095
  • [9] Individualized Stress Mobile Sensing Using Self-Supervised Pre-Training
    Islam, Tanvir
    Washington, Peter
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [10] SPot-the-Difference Self-supervised Pre-training for Anomaly Detection and Segmentation
    Zou, Yang
    Jeong, Jongheon
    Pemula, Latha
    Zhang, Dongqing
    Dabeer, Onkar
    COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 392 - 408